INDEX
Explanations
words indicating weakness or negative qualities
New Auto-Interp
Negative Logits
cek
-0.16
ien
-0.15
ers
-0.15
iesta
-0.14
eka
-0.14
ona
-0.14
routine
-0.14
olib
-0.14
+
-0.13
ve
-0.13
POSITIVE LOGITS
that
0.25
bahwa
0.25
that
0.25
rằng
0.24
että
0.22
daÃŁ
0.21
dass
0.20
že
0.20
ÑĩÑĤо
0.20
that
0.20
Activations Density 0.121%