INDEX
Explanations
instances of physical violence or conflict
instances of physical confrontation or violence
New Auto-Interp
Negative Logits
Greenwich
-0.68
irteen
-0.67
Greenwood
-0.65
Sutton
-0.62
Lyons
-0.61
Gutenberg
-0.61
Ellison
-0.60
Alphabet
-0.59
Revis
-0.59
Rodham
-0.58
POSITIVE LOGITS
doesnt
1.22
alot
1.20
dont
1.17
didnt
1.11
haha
0.96
lol
0.96
thats
0.93
nt
0.90
tho
0.89
kinda
0.88
Activations Density 1.954%