INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     spont
    -0.07
     Sniper
    -0.07
    уч
    -0.07
    -0.07
     foot
    -0.07
     leg
    -0.07
     Manning
    -0.06
     prejudice
    -0.06
    Buff
    -0.06
    POSITIVE LOGITS
     scale
    0.22
     Scale
    0.17
     scales
    0.12
     SCALE
    0.11
    scale
    0.11
    -scale
    0.10
    Scale
    0.09
    	scale
    0.08
    cales
    0.07
    CALE
    0.06
    Act Density 0.014%

    No Known Activations