INDEX
Explanations
mentions of the term "fake"
references to "fake news."
New Auto-Interp
Negative Logits
xual
-0.90
arching
-0.80
riott
-0.78
onen
-0.76
hem
-0.73
guiActiveUnfocused
-0.73
ands
-0.70
azard
-0.68
bard
-0.68
pour
-0.68
POSITIVE LOGITS
ument
0.98
pas
0.87
IDs
0.84
news
0.76
positives
0.74
ulent
0.73
tan
0.71
ulously
0.69
sounding
0.67
identities
0.65
Activations Density 0.070%