INDEX
Explanations
names of individuals
references to individuals with the initial "A."
New Auto-Interp
Negative Logits
anime
-0.69
overt
-0.68
intended
-0.65
Army
-0.64
odds
-0.61
aw
-0.59
aspir
-0.59
accuracy
-0.58
Army
-0.58
undergrad
-0.58
POSITIVE LOGITS
veyard
1.29
cker
1.14
plin
1.14
nder
1.13
uer
1.11
keley
1.09
lder
1.09
eger
1.08
gran
1.08
gha
1.07
Activations Density 0.029%