INDEX
Explanations
instances of the word "when."
New Auto-Interp
Negative Logits
them
-0.19
ise
-0.17
orsi
-0.15
tieten
-0.15
unately
-0.15
herself
-0.15
ize
-0.14
пока
-0.14
ly
-0.14
ally
-0.14
POSITIVE LOGITS
/if
0.47
soever
0.43
EVER
0.33
they
0.32
faced
0.31
asked
0.29
it
0.29
we
0.28
/how
0.28
compared
0.27
Activations Density 0.138%