INDEX
Explanations
the word "blog"
New Auto-Interp
Negative Logits
myſelf
-0.85
Jefus
-0.85
avoient
-0.84
ſelves
-0.84
étoient
-0.83
verständlich
-0.80
ſhe
-0.79
sanitaires
-0.79
himſelf
-0.77
тьяна
-0.76
POSITIVE LOGITS
xz
0.63
<eos>
0.61
0.58
]=="
0.56
$
0.56
("{0.55
录
0.55
Des
0.55
tn
0.55
mek
0.54
Activations Density 0.304%