INDEX
Explanations
repetitive or frequently mentioned nouns
New Auto-Interp
Negative Logits
ned
-0.20
igger
-0.15
etto
-0.15
avou
-0.15
ernet
-0.15
bote
-0.14
gan
-0.14
inka
-0.14
ess
-0.14
mund
-0.14
POSITIVE LOGITS
uded
0.17
sse
0.15
ëłĩ
0.15
ws
0.15
ISON
0.15
wig
0.14
лÑı
0.14
ison
0.14
atk
0.14
ToProps
0.13
Activations Density 0.034%