INDEX
Explanations
phrases conveying ongoing actions or states of being
New Auto-Interp
Negative Logits
ambi
-0.15
grim
-0.14
__)
-0.14
arel
-0.13
esson
-0.13
slack
-0.13
hoot
-0.13
pj
-0.13
à¥
-0.13
Inverse
-0.13
POSITIVE LOGITS
rosso
0.15
że
0.15
ÃŃÅ¡
0.14
orer
0.14
olec
0.14
="{!!0.14
avern
0.13
-valid
0.13
aklı
0.13
eni
0.13
Activations Density 0.018%