INDEX
Explanations
people's names, specifically those containing the letter 'A'
New Auto-Interp
Negative Logits
anime
-0.66
undergrad
-0.64
Eclipse
-0.59
enterprise
-0.58
adults
-0.57
aggregate
-0.57
intended
-0.57
millions
-0.57
Army
-0.57
enclosed
-0.56
POSITIVE LOGITS
hern
1.12
veyard
1.12
oki
1.07
cknowled
1.06
vil
1.03
ileen
1.01
verett
1.01
uld
1.00
plin
1.00
lder
1.00
Activations Density 0.023%