INDEX
Explanations
discussion around social and political issues, particularly related to feminism, the internet, entitlement, and societal expectations
New Auto-Interp
Negative Logits
suicides
-0.59
neutral
-0.58
aldehyde
-0.56
horm
-0.56
ars
-0.55
attrition
-0.55
confirmation
-0.54
notably
-0.53
wash
-0.53
isol
-0.53
POSITIVE LOGITS
entails
0.78
entail
0.76
ACA
0.72
aca
0.72
Anyway
0.71
Cry
0.70
SourceFile
0.66
FFER
0.66
FACE
0.65
versus
0.65
Activations Density 1.964%