INDEX
Explanations
phrases indicating disagreement or opposition
references to political accountability and ethical considerations
New Auto-Interp
Negative Logits
SPONSORED
-0.70
ema
-0.69
HCR
-0.67
Returning
-0.65
icum
-0.63
Exit
-0.63
mes
-0.61
911
-0.61
combat
-0.61
hide
-0.61
POSITIVE LOGITS
pretty
0.87
kinda
0.82
plenty
0.80
kidding
0.79
spoilers
0.75
typo
0.70
nerds
0.69
crappy
0.69
;)
0.68
ner
0.68
Activations Density 0.607%