INDEX
Explanations
questions and their corresponding structures in text
New Auto-Interp
Negative Logits
ophon
-0.19
oders
-0.15
Sort
-0.15
Sort
-0.15
ÑĥÑħ
-0.14
nette
-0.14
èĤ¥
-0.14
ká
-0.13
Make
-0.13
lineno
-0.13
POSITIVE LOGITS
how
0.27
what
0.24
How
0.23
whom
0.22
Does
0.21
Which
0.21
why
0.20
cui
0.20
who
0.20
how
0.19
Activations Density 0.142%