INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    олай
    -0.07
     patriotism
    -0.06
     götür
    -0.06
     rivers
    -0.06
     trash
    -0.06
     frosting
    -0.06
    	RT
    -0.06
    aaaaaaaa
    -0.06
    923
    -0.06
    .dll
    -0.06
    POSITIVE LOGITS
    (enemy
    0.08
    ्ज
    0.07
    ัวเอง
    0.07
     via
    0.06
    _same
    0.06
    ffd
    0.06
    '],↵↵
    0.06
     массив
    0.06
     Scalars
    0.06
     =
    0.06
    Act Density 0.004%

    No Known Activations