INDEX
Explanations
written content related to political or social commentary, particularly regarding feminism, activism, and controversial social issues
New Auto-Interp
Negative Logits
estones
-0.82
Finder
-0.78
contact
-0.74
foreseen
-0.74
linem
-0.73
gauge
-0.70
oother
-0.69
ounter
-0.69
milestones
-0.69
joining
-0.69
POSITIVE LOGITS
hypocrisy
1.67
hypocritical
1.48
disingen
1.36
hypoc
1.34
coward
1.33
hypocr
1.33
disgrace
1.33
arrogance
1.32
stupidity
1.30
ignorant
1.29
Activations Density 6.502%