INDEX
Explanations
mentions of content removal and potential repercussions
references to tweet removals or alterations
New Auto-Interp
Negative Logits
kindred
-0.68
Growing
-0.66
htaking
-0.66
stereotype
-0.62
superpower
-0.60
marrying
-0.60
Growing
-0.59
inav
-0.59
distinguishes
-0.59
dominates
-0.58
POSITIVE LOGITS
refund
1.00
screenshots
0.86
DMCA
0.85
refunds
0.84
deleted
0.84
apologised
0.83
retracted
0.82
apologies
0.82
redacted
0.82
reinstated
0.79
Activations Density 1.586%