INDEX
Explanations
names belonging to individuals, particularly those related to film and sports
New Auto-Interp
Negative Logits
allax
-0.15
lep
-0.15
stim
-0.15
rish
-0.15
rians
-0.15
ılıç
-0.15
letcher
-0.14
olon
-0.14
icates
-0.14
usercontent
-0.13
POSITIVE LOGITS
ne
0.23
apolis
0.21
thers
0.18
pike
0.17
ettes
0.17
TEM
0.16
hill
0.16
ette
0.16
nest
0.15
adel
0.15
Activations Density 0.012%