INDEX
Explanations
frequent articles and pronouns in the text
New Auto-Interp
Negative Logits
anza
-0.16
culos
-0.15
izedName
-0.15
bert
-0.15
ansen
-0.15
ano
-0.14
ENCH
-0.14
encent
-0.14
isan
-0.13
ÌĢ
-0.13
POSITIVE LOGITS
Lud
0.17
Mine
0.14
ahead
0.14
è¼
0.14
Dud
0.14
Mort
0.14
ered
0.14
alike
0.13
ìłĢ
0.13
vey
0.13
Activations Density 0.233%