INDEX
Explanations
references to political events or actions
New Auto-Interp
Negative Logits
ŃĶ
-0.69
ccording
-0.65
tremend
-0.63
Ô
-0.60
nonviolent
-0.58
entreprene
-0.58
subter
-0.57
senal
-0.56
ãĤ¼ãĤ¦ãĤ¹
-0.54
ij士
-0.54
POSITIVE LOGITS
↵
1.68
<|endoftext|>
1.26
↵↵
1.08
SPONSORED
0.93
Again
0.66
Alternatively
0.62
Includes
0.61
;}
0.61
Also
0.60
Unsure
0.60
Activations Density 1.121%