INDEX
Explanations
the word "How" at the beginning of sentences
New Auto-Interp
Negative Logits
goers
-0.64
ptions
-0.62
ultimate
-0.62
Feld
-0.61
outer
-0.58
article
-0.57
hereafter
-0.56
piece
-0.56
room
-0.56
Roller
-0.55
POSITIVE LOGITS
soever
1.17
ever
1.13
ells
1.04
beit
1.03
ling
0.93
itzer
0.87
dy
0.83
ls
0.82
much
0.79
leep
0.79
Activations Density 0.063%