INDEX
Explanations
references to physical actions or events
actions and accusations related to criminal behavior
New Auto-Interp
Negative Logits
?).
-1.04
.).
-0.90
}.
-0.87
!).
-0.85
)).
-0.83
).
-0.82
)!
-0.77
]).
-0.76
]."
-0.75
).
-0.74
POSITIVE LOGITS
properties
0.46
untled
0.44
Churchill
0.43
Kaplan
0.43
earchers
0.42
Gors
0.42
atin
0.42
Bernstein
0.42
Aberdeen
0.42
Uz
0.41
Activations Density 2.784%