INDEX
Explanations
phrases related to reasoning or explaining a conclusion
phrases related to accountability and consequences
New Auto-Interp
Negative Logits
alde
-0.63
BuyableInstoreAndOnline
-0.60
ellow
-0.56
uties
-0.56
Es
-0.56
enges
-0.56
DevOnline
-0.56
Raw
-0.55
angler
-0.54
asse
-0.53
POSITIVE LOGITS
thereafter
1.19
etheless
1.13
somehow
1.12
goddamn
1.07
nonetheless
1.01
fucking
1.00
furthermore
0.98
moreover
0.97
thereby
0.96
nevertheless
0.96
Activations Density 1.020%