INDEX
Explanations
phrases indicating an author's name and affiliation
references to news articles and authorship
New Auto-Interp
Negative Logits
calendars
-0.73
cabinets
-0.68
queues
-0.63
fridge
-0.61
refunds
-0.61
aults
-0.61
silence
-0.60
purs
-0.60
retaliate
-0.60
uments
-0.60
POSITIVE LOGITS
Edited
0.93
Narr
0.86
Editor
0.80
BBC
0.79
EVA
0.78
roit
0.78
WASHINGTON
0.78
cerpt
0.75
SAN
0.74
Ù
0.74
Activations Density 0.106%