INDEX
Explanations
news-related content
references to news content
New Auto-Interp
Negative Logits
¯¯
-0.71
inished
-0.71
grun
-0.69
agra
-0.67
ignt
-0.66
ueless
-0.65
SOLD
-0.64
occup
-0.64
unsuccessful
-0.64
Wee
-0.63
POSITIVE LOGITS
reader
0.97
headlines
0.95
NEWS
0.90
ource
0.88
orial
0.84
letters
0.82
feed
0.82
room
0.81
worthy
0.81
Catholic
0.78
Activations Density 0.038%