INDEX
Explanations
names of individuals
references to specific people, particularly those with the names "He" and "She."
New Auto-Interp
Negative Logits
etheless
-0.96
OPLE
-0.73
Cummings
-0.64
dotted
-0.64
grav
-0.63
quo
-0.63
WAYS
-0.63
needles
-0.63
stiffness
-0.60
terday
-0.59
POSITIVE LOGITS
oran
1.02
ussie
0.93
chel
0.89
omer
0.89
oka
0.88
iber
0.87
itzer
0.86
mans
0.84
alf
0.84
alt
0.83
Activations Density 0.145%