INDEX
Explanations
instances of minimization or downplaying of situations
New Auto-Interp
Negative Logits
aise
-0.18
dorf
-0.17
idd
-0.15
obar
-0.15
301
-0.15
hd
-0.15
oci
-0.15
ica
-0.14
kla
-0.14
257
-0.14
POSITIVE LOGITS
ventus
0.16
DTD
0.15
áv
0.15
usta
0.15
tons
0.15
UNT
0.14
ediator
0.14
/Dk
0.14
strchr
0.14
ekk
0.14
Activations Density 0.158%