INDEX
Explanations
mentions of a specific news network
New Auto-Interp
Negative Logits
rians
-0.82
akeru
-0.78
inval
-0.75
rian
-0.74
ictions
-0.71
acterial
-0.70
rogens
-0.70
ansom
-0.69
adding
-0.69
assy
-0.68
POSITIVE LOGITS
conn
1.50
News
0.98
News
0.94
hawk
0.89
hound
0.89
woods
0.86
croft
0.85
fox
0.83
cat
0.83
FOX
0.83
Activations Density 0.519%