INDEX
Explanations
the word "because" and its variations, indicating causal explanations or reasons
New Auto-Interp
Negative Logits
them
-0.19
ively
-0.16
ingt
-0.15
ERCHANT
-0.15
и
-0.15
him
-0.15
tron
-0.15
ors
-0.15
ship
-0.14
pues
-0.14
POSITIVE LOGITS
they
0.24
unlike
0.24
there
0.23
latter
0.22
nothing
0.22
nobody
0.22
we
0.20
it
0.20
although
0.19
no
0.19
Activations Density 0.094%