INDEX
Explanations
mentions of a specific physical barrier being built
references to a border wall
New Auto-Interp
Negative Logits
Gene
-0.77
uner
-0.75
lishing
-0.71
NI
-0.68
é¾įåĸļ士
-0.65
ria
-0.65
ISTER
-0.65
CLASSIFIED
-0.65
×ķ
-0.64
Reward
-0.64
POSITIVE LOGITS
abies
0.98
papers
0.97
wall
0.94
crossings
0.93
thickness
0.90
separating
0.90
walls
0.87
erected
0.85
aby
0.83
wart
0.81
Activations Density 0.018%