INDEX
Explanations
names, particularly last names
proper nouns, specifically names and initials
New Auto-Interp
Negative Logits
HCR
-0.59
[];
-0.56
reau
-0.53
VB
-0.52
ãĥ¯
-0.52
sburgh
-0.52
INA
-0.52
ruary
-0.52
075
-0.50
inus
-0.50
POSITIVE LOGITS
ibaba
0.65
Hasan
0.59
akh
0.57
omaly
0.56
azi
0.52
utsch
0.52
Holocaust
0.51
bg
0.51
ulk
0.50
Muslims
0.49
Activations Density 0.149%