INDEX
Explanations
the name "Ali" in various contexts
New Auto-Interp
Negative Logits
acters
-0.88
olicy
-0.84
IAL
-0.77
ilaterally
-0.76
ly
-0.75
ividual
-0.73
nect
-0.73
sylvania
-0.71
osuke
-0.70
lished
-0.68
POSITIVE LOGITS
yah
1.28
ño
1.02
ensis
0.98
Äĩ
0.96
ñ
0.94
ña
0.86
ère
0.86
ya
0.83
WAYS
0.82
ë
0.82
Activations Density 0.006%