INDEX
Explanations
mentions of names, possibly with a specific focus on names starting or containing "Rita," "Anita," "Juanita," "Conan," and "Doyle."
proper nouns, particularly names of individuals and locations
New Auto-Interp
Negative Logits
yg
-0.95
erate
-0.88
s
-0.87
er
-0.87
cript
-0.87
sin
-0.83
edo
-0.82
iful
-0.81
ijah
-0.81
son
-0.81
POSITIVE LOGITS
Literature
0.71
pling
0.69
Rin
0.69
Moreno
0.69
odan
0.68
plings
0.63
Cout
0.62
Pole
0.61
Plate
0.60
Scotia
0.60
Activations Density 0.172%