INDEX
Explanations
questions expressing confusion or disbelief about social expectations and criticisms
New Auto-Interp
Negative Logits
ypi
-0.18
imbus
-0.16
adipiscing
-0.15
culus
-0.14
.pg
-0.14
odash
-0.13
ando
-0.13
atcher
-0.13
reh
-0.13
Net
-0.13
POSITIVE LOGITS
ancel
0.17
ault
0.15
unan
0.15
.ibatis
0.15
Burton
0.15
omers
0.14
democr
0.14
funnel
0.14
burden
0.14
VML
0.14
Activations Density 0.149%