INDEX
Explanations
religious references or contexts in which authority figures are speaking
phrases related to violent or aggressive actions
New Auto-Interp
Negative Logits
homebrew
-0.87
Wilmington
-0.82
Downs
-0.79
chirop
-0.79
Staten
-0.78
Craigslist
-0.77
downs
-0.77
brunch
-0.76
Franklin
-0.75
Asheville
-0.75
POSITIVE LOGITS
اÙĦ
1.47
ÙĪ
1.43
Ø
1.37
اØ
1.36
Ù
1.36
ا
1.34
Islamic
1.34
ت
1.34
Pakistan
1.33
Ùħ
1.32
Activations Density 0.360%