INDEX
Explanations
phrases related to sanctions or restricted/prohibited items/people/activities
terms related to sanctuary cities and online communities
New Auto-Interp
Negative Logits
Kingdoms
-0.75
REE
-0.72
Disp
-0.68
Scand
-0.67
sg
-0.63
ython
-0.62
ANY
-0.61
Rocks
-0.61
Lunar
-0.59
ï¸ı
-0.59
POSITIVE LOGITS
ctuary
1.03
ufact
1.01
ovic
0.91
ews
0.90
esthetic
0.90
nered
0.90
quez
0.87
abilia
0.84
thood
0.84
onymous
0.84
Activations Density 0.034%