INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hostage
    -0.07
     uncovered
    -0.07
     bul
    -0.07
     Cultural
    -0.07
    -0.07
     Ін
    -0.07
    (sign
    -0.06
     misog
    -0.06
     PROM
    -0.06
     apologized
    -0.06
    POSITIVE LOGITS
    	fire
    0.07
     Mathf
    0.07
    _CTRL
    0.07
     світі
    0.07
    _METHOD
    0.06
    0.06
    DED
    0.06
     ascent
    0.06
     seulement
    0.06
    .hpp
    0.06
    Act Density 0.021%

    No Known Activations