INDEX
Explanations
names of particular individuals
proper nouns, specifically names of people or notable figures
New Auto-Interp
Negative Logits
Raider
-0.79
????????
-0.78
cock
-0.76
yy
-0.73
hawk
-0.73
marine
-0.73
bikini
-0.72
hump
-0.72
mallow
-0.70
Coffin
-0.70
POSITIVE LOGITS
Tayyip
1.03
Nicol
0.84
atively
0.81
guyen
0.78
ensibly
0.77
encies
0.77
lev
0.77
Recep
0.76
citiz
0.76
inders
0.76
Activations Density 0.009%