INDEX
Explanations
references to identity or self-referential phrases
New Auto-Interp
Negative Logits
erunner
-0.63
hunne
-0.63
astify
-0.61
дописавши
-0.60
iastes
-0.59
ENOS
-0.57
AppCompatTheme
-0.57
Racine
-0.56
tdc
-0.56
brady
-0.55
POSITIVE LOGITS
itself
1.38
itself
1.34
Itself
1.29
Roskov
0.98
sendiri
0.95
himself
0.91
Himself
0.87
本身
0.86
herself
0.86
themselves
0.84
Activations Density 0.134%