INDEX
Explanations
phrases or sentences indicating a reason or justification
instances of the word "because" indicating reasoning or justification
New Auto-Interp
Negative Logits
wn
-0.72
jet
-0.70
pmwiki
-0.68
alysed
-0.67
iets
-0.66
whel
-0.64
ax
-0.64
exting
-0.64
ns
-0.64
moderate
-0.64
POSITIVE LOGITS
otherwise
1.05
unlike
0.95
hey
0.94
nobody
0.91
there
0.91
frankly
0.89
it
0.88
they
0.88
obviously
0.88
we
0.85
Activations Density 0.093%