INDEX
Explanations
references to personal pronouns and questions about the subject or context
New Auto-Interp
Negative Logits
enthal
-0.16
uis
-0.15
odus
-0.15
ÑģÑĤанÑĥ
-0.15
Dah
-0.15
enge
-0.14
lein
-0.14
.gf
-0.14
Aren
-0.14
Duch
-0.14
POSITIVE LOGITS
does
0.57
does
0.45
Does
0.43
did
0.41
do
0.41
Does
0.40
_does
0.35
DOES
0.33
doe
0.31
did
0.30
Activations Density 0.115%