INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    theless
    -0.71
    dash
    -0.71
    feet
    -0.65
     belt
    -0.63
    belt
    -0.63
     Mub
    -0.60
     Hats
    -0.59
     naked
    -0.58
    bed
    -0.58
    pher
    -0.58
    POSITIVE LOGITS
    ations
    1.23
    ational
    1.20
    ants
    1.16
    ables
    1.15
    ation
    1.14
    estate
    1.08
    ant
    1.06
    ruction
    1.03
    Ģ
    1.03
    omy
    1.01
    Act Density 0.035%

    No Known Activations