INDEX
Explanations
individuals or groups taking action by speaking up about a particular issue or coming forward with information
instances of people coming forward to share their experiences or allegations
New Auto-Interp
Negative Logits
rip
-0.92
acca
-0.75
Bots
-0.73
tch
-0.69
arcity
-0.67
raltar
-0.64
lez
-0.64
onga
-0.61
gee
-0.61
nic
-0.61
POSITIVE LOGITS
osal
0.74
testimonies
0.68
onyms
0.65
allegations
0.65
voluntarily
0.63
submissions
0.63
toile
0.63
pronouns
0.63
vier
0.62
olicy
0.60
Activations Density 0.011%