INDEX
Explanations
phrases related to expressing opinions or giving warnings
statements and warnings regarding social and political issues
New Auto-Interp
Negative Logits
ecast
-0.89
ocument
-0.82
ixture
-0.73
tyard
-0.71
ixt
-0.68
adena
-0.67
efficients
-0.64
ocaust
-0.64
andem
-0.64
etrical
-0.62
POSITIVE LOGITS
accordingly
0.96
afterward
0.71
encour
0.70
sarcast
0.70
stressing
0.69
furthermore
0.68
vowed
0.67
nonetheless
0.66
blaming
0.66
rhet
0.64
Activations Density 0.318%