INDEX
Explanations
mentions of controversial political figures and topics
punctuation marks and their context in the text
New Auto-Interp
Negative Logits
displacement
-0.94
disciplines
-0.89
hatch
-0.84
encomp
-0.83
colours
-0.83
prec
-0.82
foc
-0.81
unexpl
-0.81
strugg
-0.81
hydrogen
-0.81
POSITIVE LOGITS
Anyway
1.72
UPDATE
1.59
Yesterday
1.54
SPONSORED
1.51
Apparently
1.51
Advertisement
1.46
Via
1.44
Here
1.44
CNN
1.43
Guest
1.40
Activations Density 0.491%