INDEX
Explanations
words related to destructive acts such as vandalism, arson, and graffiti
instances of vandalism and related terms
New Auto-Interp
Negative Logits
croft
-0.85
vana
-0.80
abol
-0.77
ramid
-0.77
pres
-0.75
obar
-0.75
equal
-0.73
gravity
-0.71
erald
-0.71
omo
-0.70
POSITIVE LOGITS
vandalism
1.29
vandal
1.09
graffiti
1.06
spree
0.99
leve
0.82
dere
0.81
arson
0.76
scaven
0.75
rampage
0.74
ruction
0.70
Activations Density 0.013%