INDEX
    Explanations

    short phrases or sentences expressing physical actions or states

    negative comments or criticisms about individuals

    New Auto-Interp
    Negative Logits
    iosyncr
    -0.67
    uates
    -0.66
    osponsors
    -0.66
    SpaceEngineers
    -0.65
     ongoing
    -0.61
     conduc
    -0.60
    erenn
    -0.58
     coincides
    -0.57
     Flavoring
    -0.56
     preliminary
    -0.56
    POSITIVE LOGITS
     he
    1.20
    He
    1.20
    Was
    1.10
     Didn
    1.08
     He
    1.07
    didn
    1.06
    he
    1.02
     didnt
    1.02
    Had
    1.02
    his
    0.98
    Act Density 0.537%

    No Known Activations