INDEX
Explanations
phrases indicating a reason or cause
instances of the word "because" indicating reasons or justifications
New Auto-Interp
Negative Logits
mint
-0.73
nin
-0.71
yan
-0.68
ries
-0.67
hal
-0.65
Scar
-0.65
wn
-0.63
Ku
-0.62
nex
-0.61
sey
-0.61
POSITIVE LOGITS
*/(
0.97
uristic
0.73
they
0.69
nobody
0.67
":"/
0.65
we
0.64
assetsadobe
0.64
urers
0.64
there
0.64
uras
0.63
Activations Density 0.069%