INDEX
Explanations
proper nouns or entities mentioned in a rather technical or formal context
pronouns and definitive statements
New Auto-Interp
Negative Logits
Enlarge
-0.60
prising
-0.54
Had
-0.53
attering
-0.52
paio
-0.52
Marginal
-0.52
awed
-0.51
ences
-0.51
790
-0.50
wana
-0.50
POSITIVE LOGITS
is
1.38
Is
1.06
are
1.05
IS
1.00
is
0.95
isn
0.95
was
0.85
Are
0.84
ARE
0.84
iss
0.84
Activations Density 0.398%