INDEX
    Explanations

    expressions related to social or political actions and their consequences

    New Auto-Interp
    Negative Logits
    íĨµ
    -0.16
     Arts
    -0.15
    ebek
    -0.15
    §
    -0.14
    %S
    -0.14
    falls
    -0.14
    agra
    -0.14
    âĢĮس
    -0.14
    ัà¸ķร
    -0.14
     Basic
    -0.13
    POSITIVE LOGITS
    nice
    0.14
    agues
    0.14
    šť
    0.14
    _PTR
    0.14
    walk
    0.14
    atron
    0.14
    apest
    0.14
    olson
    0.14
    rowad
    0.14
    asel
    0.14
    Act Density 0.440%

    No Known Activations