INDEX
    Explanations

    words or phrases related to damage and injury in various contexts

    New Auto-Interp
    Negative Logits
     cogn
    -0.20
     hod
    -0.14
    Fr
    -0.14
     Rol
    -0.14
     Talent
    -0.14
    αι
    -0.14
    TA
    -0.13
    ton
    -0.13
    etr
    -0.13
    اÛĮÙĩ
    -0.13
    POSITIVE LOGITS
    outu
    0.16
    reme
    0.16
    LabelText
    0.15
    \Bridge
    0.15
    .scalablytyped
    0.15
    ↵↵
    0.14
    åľĨ
    0.14
    anden
    0.14
    PLUGIN
    0.14
    _smooth
    0.14
    Act Density 0.006%

    No Known Activations