INDEX
Explanations
phrases emphasizing self-reference or introspective concepts
New Auto-Interp
Negative Logits
nakalista
-0.78
########.
-0.71
kürzlich
-0.70
recentemente
-0.68
récemment
-0.68
enuta
-0.67
daglig
-0.66
itinéraires
-0.65
zepine
-0.65
lenker
-0.64
POSITIVE LOGITS
itself
2.29
itself
2.09
Itself
1.92
themselves
1.21
sich
1.09
themselves
1.08
本身
0.98
itſelf
0.95
zich
0.90
zichzelf
0.81
Activations Density 0.103%