INDEX
Explanations
repeated references to individuals, particularly in terms of their actions and roles
New Auto-Interp
Negative Logits
izontal
-0.14
tical
-0.14
saf
-0.14
arakter
-0.14
stantial
-0.14
_meter
-0.13
untu
-0.13
acional
-0.13
ymous
-0.13
oro
-0.13
POSITIVE LOGITS
chy
0.15
idelberg
0.15
iag
0.15
alic
0.15
-story
0.14
adic
0.14
ewith
0.14
chia
0.14
.gl
0.13
kel
0.13
Activations Density 0.010%