INDEX
Explanations
people's names, particularly first names
names of individuals, particularly those starting with the letter 'C'
New Auto-Interp
Negative Logits
ACTED
-0.67
Haitian
-0.63
_-
-0.60
theless
-0.60
Romanian
-0.59
dime
-0.57
Seym
-0.56
ropolitan
-0.55
âĸĪ
-0.54
׾
-0.54
POSITIVE LOGITS
acca
0.90
ohan
0.81
reau
0.79
lear
0.76
anyon
0.70
aney
0.69
grain
0.68
ensen
0.68
orne
0.68
rouse
0.68
Activations Density 0.129%