INDEX
Explanations
names of people
proper nouns, particularly names
New Auto-Interp
Negative Logits
suspic
-0.77
includ
-0.66
targ
-0.65
NETWORK
-0.64
ccording
-0.64
SYSTEM
-0.64
referen
-0.62
destro
-0.62
behav
-0.60
XCOM
-0.60
POSITIVE LOGITS
alike
1.03
eele
0.91
axter
0.88
esson
0.86
ilda
0.85
oliath
0.79
meier
0.77
bur
0.75
Ru
0.74
soDeliveryDate
0.74
Activations Density 0.253%