INDEX
Explanations
adjectives describing emotional reactions
expressions of strong emotional reactions and opinions
New Auto-Interp
Negative Logits
Brief
-0.63
indist
-0.60
gradual
-0.60
uly
-0.59
equ
-0.59
escal
-0.58
freely
-0.58
few
-0.57
arser
-0.57
annis
-0.57
POSITIVE LOGITS
indeed
0.68
––
0.68
considering
0.67
hyde
0.65
ãĤ¬
0.64
ktop
0.64
rez
0.62
mercial
0.62
Ùĩ
0.62
insofar
0.62
Activations Density 0.400%