INDEX
Explanations
statements expressing perception or subjective opinion
New Auto-Interp
Negative Logits
ampoline
-0.15
nev
-0.15
onas
-0.14
ito
-0.14
to
-0.14
drop
-0.14
whats
-0.14
jÄĻ
-0.14
licensed
-0.14
icensed
-0.13
POSITIVE LOGITS
rằng
0.16
840
0.14
xor
0.14
951
0.14
bay
0.14
hic
0.13
بات
0.13
tual
0.13
AVA
0.13
213
0.13
Activations Density 0.040%