INDEX
Explanations
references to actions involving conflict or confrontation
references to violent incidents and their implications
New Auto-Interp
Negative Logits
his
-0.61
his
-0.60
their
-0.55
HIS
-0.55
inexper
-0.52
intended
-0.52
herself
-0.52
thy
-0.51
lest
-0.51
illary
-0.51
POSITIVE LOGITS
ãĤ¼ãĤ¦ãĤ¹
0.70
Ñı
0.66
Lots
0.65
æĺ¯
0.64
Category
0.59
"}],"
0.59
ãĥ´ãĤ¡
0.57
\/\/
0.57
Ò
0.56
ocaly
0.56
Activations Density 0.829%