INDEX
Explanations
words related to causality or consequence
the word "hence" in various contexts
New Auto-Interp
Negative Logits
hitter
-0.65
nurs
-0.64
abies
-0.64
estation
-0.63
Bull
-0.59
%-
-0.58
batter
-0.57
Tasman
-0.56
battered
-0.56
Mehran
-0.56
POSITIVE LOGITS
forth
2.11
forward
1.45
noon
0.85
oji
0.78
far
0.77
why
0.77
hua
0.77
alf
0.76
rely
0.75
ween
0.75
Activations Density 0.014%