INDEX
Explanations
mentions of sanctuary cities or related terms
references to sanctuary cities and related policies
New Auto-Interp
Negative Logits
resil
-0.73
ahon
-0.72
liam
-0.69
UGC
-0.66
lass
-0.66
phies
-0.64
ripp
-0.61
dule
-0.61
inness
-0.59
product
-0.59
POSITIVE LOGITS
ctuary
1.05
sanctuary
0.94
refuge
0.87
grounds
0.86
cities
0.79
havens
0.78
Refuge
0.77
gence
0.75
TING
0.75
keeper
0.71
Activations Density 0.080%