INDEX
Explanations
references to physical barriers or obstacles, such as barricades and walls
references to barriers or segregation
New Auto-Interp
Negative Logits
din
-0.71
NOR
-0.68
MIN
-0.66
americ
-0.66
gil
-0.65
WIN
-0.63
hemor
-0.59
negatively
-0.58
prosecut
-0.57
ICT
-0.57
POSITIVE LOGITS
atural
1.33
et
1.33
eties
1.30
ets
1.27
ety
1.25
eness
1.23
etry
1.20
ements
1.14
estone
1.12
ete
1.10
Activations Density 0.123%