INDEX
Explanations
mentions of locations or places
instances of the verb "be."
New Auto-Interp
Negative Logits
rador
-0.81
azines
-0.80
monop
-0.72
partName
-0.70
drift
-0.66
ilver
-0.65
Ples
-0.65
hawk
-0.65
Palestin
-0.65
prefrontal
-0.64
POSITIVE LOGITS
yond
1.34
arers
1.12
arer
1.09
FORE
1.07
cker
1.04
zos
0.94
gotten
0.93
ards
0.92
ardless
0.89
ige
0.87
Activations Density 0.025%