INDEX
Explanations
the presence of specific named individuals or characters, particularly in literature or media
New Auto-Interp
Negative Logits
Rough
-0.17
urr
-0.16
eph
-0.15
ene
-0.15
못
-0.15
oogle
-0.15
rough
-0.15
KN
-0.15
OM
-0.15
scr
-0.14
POSITIVE LOGITS
жа
0.17
teil
0.16
BOOLE
0.15
izona
0.15
sted
0.15
ζα
0.14
assa
0.14
ADIUS
0.14
ourke
0.14
dit
0.14
Activations Density 0.077%