INDEX
Explanations
the word "did" in various contexts
why did questions
New Auto-Interp
Negative Logits
houſe
-0.80
pleaſure
-0.77
ſtate
-0.75
Houſe
-0.69
ſelves
-0.68
purpoſe
-0.64
ſtre
-0.63
NameInMap
-0.60
ſche
-0.59
ſte
-0.57
POSITIVE LOGITS
did
0.80
did
0.77
Did
0.77
Did
0.69
was
0.66
DID
0.64
DID
0.58
had
0.57
Twas
0.56
gave
0.55
Activations Density 0.055%