INDEX
Explanations
pronouns and determiners that often signal references to people, events, or specifics in a narrative context
New Auto-Interp
Negative Logits
_mC
-0.20
_mB
-0.18
_mD
-0.17
_mE
-0.16
sovereign
-0.15
iter
-0.15
otts
-0.15
_tD
-0.15
.uni
-0.14
itter
-0.14
POSITIVE LOGITS
isha
0.18
fir
0.15
Fir
0.14
umerator
0.14
imes
0.14
ninger
0.14
oka
0.14
oken
0.14
compan
0.14
inda
0.13
Activations Density 0.270%