INDEX
Explanations
phrases that provide reasons or explanations
instances of the word "Because" indicating explanations or justifications
New Auto-Interp
Negative Logits
shaw
-0.78
jet
-0.76
ns
-0.74
wn
-0.74
mint
-0.73
yan
-0.71
åĤ
-0.71
robe
-0.71
shr
-0.68
VR
-0.67
POSITIVE LOGITS
fuck
0.70
beware
0.69
they
0.66
*/(
0.65
ecause
0.65
Prosper
0.64
there
0.64
âĶģ
0.64
olini
0.63
elligence
0.61
Activations Density 0.049%