INDEX
Explanations
words related to security or protection
references to guards or security personnel
New Auto-Interp
Negative Logits
BLE
-0.84
ãĥ£
-0.75
Springs
-0.68
izoph
-0.66
KEN
-0.66
MAP
-0.66
theless
-0.65
MED
-0.61
ctive
-0.60
sequent
-0.60
POSITIVE LOGITS
rail
0.97
men
0.87
heed
0.85
ages
0.84
guard
0.83
guarding
0.83
masters
0.81
maid
0.80
dog
0.79
llan
0.78
Activations Density 0.028%