INDEX
Explanations
negative descriptors related to arguments or criticisms
New Auto-Interp
Negative Logits
reeze
-0.16
bastante
-0.15
ers
-0.15
olib
-0.15
cek
-0.15
ien
-0.15
ycled
-0.14
uku
-0.14
wan
-0.14
byn
-0.14
POSITIVE LOGITS
that
0.26
that
0.25
että
0.20
daÃŁ
0.19
ÑĩÑĤо
0.19
that
0.19
nobody
0.19
it
0.18
że
0.18
bahwa
0.18
Activations Density 0.118%