INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Straw
    -0.09
    bear
    -0.08
     rough
    -0.08
     crude
    -0.08
    ежде
    -0.08
     Firstly
    -0.08
     defe
    -0.08
     Rough
    -0.07
     TAC
    -0.07
     TD
    -0.07
    POSITIVE LOGITS
     tourists
    0.08
     shaken
    0.08
     Jews
    0.07
    instructions
    0.07
    _acc
    0.07
     publishers
    0.07
    Parents
    0.07
    PRE
    0.07
    Drops
    0.07
     intrigued
    0.07
    Act Density 0.000%

    No Known Activations