INDEX
Explanations
proper nouns
references to individuals identified as "one of" in various contexts
New Auto-Interp
Negative Logits
equival
-0.64
respective
-0.60
cats
-0.59
given
-0.59
icans
-0.53
Provided
-0.52
gif
-0.52
noses
-0.51
asions
-0.51
nas
-0.50
POSITIVE LOGITS
Hundred
0.85
hundred
0.84
of
0.73
Drive
0.73
step
0.71
month
0.66
esan
0.66
eenth
0.66
teenth
0.65
kilomet
0.64
Activations Density 0.060%