INDEX
    Explanations

    statements expressing moral judgments or criticisms

    New Auto-Interp
    Negative Logits
     tamp
    -0.15
    ÏĢι
    -0.15
    erap
    -0.15
    illez
    -0.14
    ohn
    -0.14
    artz
    -0.14
    airo
    -0.14
    [to
    -0.14
    oce
    -0.14
    OMET
    -0.14
    POSITIVE LOGITS
    IVA
    0.15
    ứ
    0.15
    utter
    0.15
    olarity
    0.15
    orp
    0.15
    sorry
    0.14
    ivos
    0.14
     simply
    0.14
    ivial
    0.14
     kimse
    0.14
    Act Density 0.380%

    No Known Activations