INDEX
Explanations
incidents involving physical harm or violence
events involving violent actions or assaults
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.80
singular
-0.74
extraord
-0.70
aster
-0.69
thous
-0.68
clich
-0.67
understatement
-0.66
miracle
-0.66
everlasting
-0.65
miracles
-0.62
POSITIVE LOGITS
biking
0.92
bicy
0.91
travelling
0.90
attempting
0.88
protesting
0.88
fleeing
0.87
parked
0.86
hiking
0.86
filming
0.83
exercising
0.83
Activations Density 0.294%