INDEX
Explanations
mentions of social and political current events
mentions of social and political issues
New Auto-Interp
Negative Logits
.�
-0.66
".[
-0.58
Medium
-0.56
Redditor
-0.56
.}
-0.56
}.
-0.55
Rated
-0.55
SPONSORED
-0.54
âĵĺ
-0.53
''.
-0.53
POSITIVE LOGITS
sequ
0.50
rament
0.49
ussion
0.48
rehearsal
0.47
exhaustive
0.44
deadline
0.44
oslav
0.44
iatus
0.43
technically
0.42
foregoing
0.42
Activations Density 2.429%