INDEX
Explanations
proper nouns, specifically names of individuals
mentions of specific individuals or names in the text
New Auto-Interp
Negative Logits
ALLY
-0.77
NetMessage
-0.73
opausal
-0.72
eers
-0.71
netflix
-0.70
Disneyland
-0.68
Kinnikuman
-0.64
naissance
-0.64
ĺħ
-0.63
ISE
-0.62
POSITIVE LOGITS
atche
1.07
rite
0.97
mus
0.93
ani
0.87
ans
0.86
aku
0.86
inx
0.85
aji
0.85
onso
0.85
arah
0.83
Activations Density 0.030%