INDEX
    Explanations

    elements related to social dynamics and problematic behavior

    New Auto-Interp
    Negative Logits
     either
    -0.59
     Either
    -0.54
    either
    -0.52
    Either
    -0.50
     EITHER
    -0.48
     либо
    -0.28
     soit
    -0.25
    ither
    -0.20
     OTHERWISE
    -0.18
    ichever
    -0.18
    POSITIVE LOGITS
     nor
    0.94
    nor
    0.66
     Nor
    0.63
     NOR
    0.58
    Nor
    0.57
     nors
    0.35
     ноÑĢ
    0.29
     neither
    0.29
     né
    0.29
     Norris
    0.28
    Act Density 0.021%

    No Known Activations