INDEX
Explanations
proper nouns related to a specific person named "Toni."
mentions of a specific individual named "Toni."
New Auto-Interp
Negative Logits
rants
-0.82
lished
-0.77
rir
-0.73
liest
-0.70
rified
-0.69
lain
-0.69
lif
-0.69
rum
-0.68
slot
-0.68
sed
-0.68
POSITIVE LOGITS
zzo
1.07
zzle
1.02
Äĩ
1.02
oni
0.95
orno
0.90
zzi
0.89
ña
0.86
ón
0.85
ja
0.85
Äį
0.84
Activations Density 0.011%