INDEX
Explanations
instances of the word "you" and its various forms
New Auto-Interp
Negative Logits
WER
-0.17
ocht
-0.15
æĦı
-0.15
nees
-0.15
ÑĤим
-0.14
ISTR
-0.14
Alive
-0.13
RIPT
-0.13
orry
-0.13
-cols
-0.13
POSITIVE LOGITS
Econ
0.17
/e
0.14
èĻ
0.14
emiz
0.14
ured
0.14
arily
0.14
vit
0.14
Ļ
0.14
оÑĢов
0.14
arrera
0.14
Activations Density 0.035%