INDEX
Explanations
words related to names of places or people
names or variants related to individuals, particularly with the letter pattern "il"
New Auto-Interp
Negative Logits
elig
-0.80
lished
-0.62
ultras
-0.60
TRY
-0.60
bluff
-0.60
entrants
-0.58
horr
-0.57
nightmares
-0.57
$$$$
-0.56
giveaway
-0.56
POSITIVE LOGITS
igans
1.00
felt
0.79
idan
0.75
kil
0.73
nian
0.73
sworth
0.73
velt
0.73
aku
0.72
zen
0.72
frames
0.71
Activations Density 0.092%