INDEX
Explanations
phrases related to societal issues, activism, and reflections on historical events
repeated phrases that emphasize collective actions or thoughts
New Auto-Interp
Negative Logits
REDACTED
-0.72
atory
-0.69
Publication
-0.65
Owner
-0.60
visor
-0.58
Tai
-0.58
cum
-0.58
math
-0.56
Mehran
-0.56
photos
-0.56
POSITIVE LOGITS
're
1.33
've
1.23
akening
1.08
athered
1.08
'll
1.06
selves
1.03
'd
0.99
ourselves
0.99
ird
0.95
imar
0.94
Activations Density 0.238%