INDEX
Explanations
mentions of the mainstream media
repeated references to mainstream media
New Auto-Interp
Negative Logits
arcity
-0.83
atoon
-0.82
cific
-0.81
Tome
-0.73
otos
-0.70
tein
-0.69
thur
-0.68
uana
-0.68
alid
-0.68
haps
-0.67
POSITIVE LOGITS
media
0.88
outlets
0.85
ization
0.84
mainstream
0.79
iary
0.78
arily
0.78
ership
0.76
ing
0.74
wisdom
0.73
isation
0.72
Activations Density 0.039%