INDEX
Explanations
proper nouns, particularly names of people
mentions of individuals or characters associated with specific names
New Auto-Interp
Negative Logits
ablishment
-0.86
omorphic
-0.86
ãĥ¤
-0.79
este
-0.78
uers
-0.76
place
-0.75
ainted
-0.73
areth
-0.72
bris
-0.71
ges
-0.71
POSITIVE LOGITS
Thornton
1.11
nton
1.06
Sharks
0.98
Farrell
0.91
icum
0.84
Kubrick
0.77
Smy
0.75
Dough
0.74
Bore
0.71
Tomas
0.70
Activations Density 0.028%