INDEX
Explanations
instances where the word "since" is used in sentences
phrases indicating a causal relationship or temporal markers in a discussion
New Auto-Interp
Negative Logits
pione
-0.84
exting
-0.79
vantage
-0.77
ilan
-0.77
displayText
-0.77
pec
-0.74
ā
-0.74
RandomRedditor
-0.73
Ď
-0.73
û
-0.73
POSITIVE LOGITS
they
1.09
many
1.06
there
1.04
most
1.03
neither
1.02
nobody
1.01
it
0.97
we
0.92
none
0.89
rely
0.85
Activations Density 0.180%