INDEX
Explanations
profanity and insults
sentences that convey strong opinions or critiques
New Auto-Interp
Negative Logits
enrol
-0.82
emanc
-0.77
isbury
-0.76
jointly
-0.75
exclusively
-0.71
authorized
-0.71
extensively
-0.70
interpreter
-0.70
footing
-0.69
eteenth
-0.68
POSITIVE LOGITS
Anyway
1.39
Especially
1.28
;)
1.25
Seriously
1.20
Anyway
1.19
Thankfully
1.19
Maybe
1.16
Luckily
1.15
Besides
1.13
That
1.11
Activations Density 0.797%