INDEX
Explanations
names of individuals or specific entities mentioned in the text
New Auto-Interp
Negative Logits
angers
-0.80
keyes
-0.79
acea
-0.76
aceous
-0.72
keye
-0.71
spective
-0.70
aido
-0.70
alez
-0.69
orders
-0.69
lando
-0.68
POSITIVE LOGITS
leon
1.10
lled
0.95
eh
0.89
e
0.87
lla
0.84
ña
0.84
llan
0.81
ez
0.80
lling
0.78
xual
0.78
Activations Density 0.024%