INDEX
Explanations
phrases indicating reasons or explanations
the word "why" used in various contexts throughout the text
New Auto-Interp
Negative Logits
ãĥ¼ãĤ¯
-0.78
ty
-0.67
rop
-0.67
aith
-0.67
ãĥĥãĥĪ
-0.63
bow
-0.63
bor
-0.62
zman
-0.60
result
-0.57
iox
-0.57
POSITIVE LOGITS
why
0.91
we
0.89
they
0.81
soever
0.76
many
0.73
it
0.72
Canaver
0.72
terday
0.66
ratulations
0.66
there
0.66
Activations Density 0.048%