INDEX
Explanations
references to specific violent incidents or groups
New Auto-Interp
Negative Logits
Vin
-0.16
heiro
-0.14
Smy
-0.14
Mehmet
-0.14
meyi
-0.14
Lyft
-0.14
ios
-0.14
Foo
-0.14
Champ
-0.14
FreeBSD
-0.13
POSITIVE LOGITS
Nigeria
0.20
Rangers
0.17
Frontier
0.17
Texas
0.17
.FontStyle
0.17
Texas
0.17
Nigerian
0.17
Bair
0.17
Texans
0.16
Cameroon
0.16
Activations Density 0.023%