INDEX
Explanations
expressions of strong emotions and personal experiences
New Auto-Interp
Negative Logits
æ©
-0.63
withd
-0.58
redes
-0.56
Cooke
-0.55
imposed
-0.53
etheless
-0.53
ļéĨĴ
-0.52
lished
-0.52
jri
-0.51
req
-0.51
POSITIVE LOGITS
"—
0.97
"?
0.97
,'"
0.96
"]
0.96
")
0.94
%"
0.91
"),
0.87
":
0.83
zbollah
0.83
.")
0.83
Activations Density 0.268%