INDEX
Explanations
adverbs that imply a degree of caution or deliberateness
New Auto-Interp
Negative Logits
pu
-0.16
stead
-0.16
pet
-0.15
-0.14
bit
-0.14
izar
-0.14
pa
-0.13
_WIN
-0.13
Brush
-0.13
Äijá»ķi
-0.13
POSITIVE LOGITS
rosse
0.15
embro
0.15
esch
0.14
ाà¤Ĺत
0.14
éĸ
0.14
wit
0.14
sız
0.14
irim
0.14
kola
0.14
Ùĩ
0.13
Activations Density 0.305%