INDEX
Explanations
phrases related to political messages displayed through signs or stickers
quotations and slogans related to social justice issues
New Auto-Interp
Negative Logits
stride
-0.78
term
-0.73
blush
-0.72
chunk
-0.72
theoret
-0.71
scales
-0.71
frame
-0.71
topic
-0.71
predicate
-0.69
subject
-0.69
POSITIVE LOGITS
RIP
1.12
Dear
1.07
Goodbye
1.01
Congratulations
0.99
$$$$
0.97
Tomorrow
0.94
Eat
0.92
ESCO
0.91
LO
0.91
ãģĤ
0.89
Activations Density 0.199%