INDEX
Explanations
names, specifically focusing on names ending in 'las' and 'Andreas'
names of individuals
New Auto-Interp
Negative Logits
riter
-0.83
lance
-0.81
wards
-0.78
isite
-0.77
ished
-0.76
fare
-0.75
fecture
-0.75
ewitness
-0.74
lled
-0.74
lished
-0.73
POSITIVE LOGITS
henko
0.96
andro
0.91
lav
0.89
andr
0.88
'
0.87
Magikarp
0.87
Kats
0.85
Maduro
0.84
Jere
0.84
Anton
0.84
Activations Density 0.081%