INDEX
Explanations
phrases related to actions or statuses of people
emotions and states of conflict in narratives
New Auto-Interp
Negative Logits
respectively
-0.73
?".
-0.72
selves
-0.70
thereof
-0.69
selves
-0.66
$.
-0.65
collectively
-0.64
VERTISEMENT
-0.64
ĨĴ
-0.64
}.
-0.63
POSITIVE LOGITS
himself
0.81
itone
0.77
herself
0.69
veland
0.67
his
0.64
ansky
0.64
hirt
0.63
isconsin
0.56
agle
0.56
essen
0.56
Activations Density 1.283%