INDEX
Explanations
mentions of the media
mentions of the media
New Auto-Interp
Negative Logits
vasive
-0.82
thens
-0.73
Parenthood
-0.71
ajor
-0.70
cker
-0.69
Sap
-0.68
Tec
-0.65
erville
-0.63
Wrath
-0.63
adian
-0.63
POSITIVE LOGITS
media
1.06
outlets
1.02
media
1.00
eval
1.00
Media
0.93
outlet
0.86
wiki
0.83
mog
0.83
medi
0.74
plurality
0.73
Activations Density 0.031%