INDEX
Explanations
names of individuals, particularly with the first name "Thomas" or "Tom."
New Auto-Interp
Negative Logits
ouz
-0.16
ToLocal
-0.15
enta
-0.14
ynes
-0.14
ipse
-0.14
elyn
-0.14
obao
-0.14
idlo
-0.14
nds
-0.13
cbc
-0.13
POSITIVE LOGITS
-chan
0.16
ors
0.15
uke
0.15
нав
0.15
er
0.15
zilla
0.14
uto
0.14
echa
0.14
egal
0.14
å±
0.14
Activations Density 0.019%