INDEX
Explanations
highly frequent proper nouns or entities, especially with repeating patterns in their names
references to various media entities and news sources
New Auto-Interp
Negative Logits
deduct
-0.74
additionally
-0.73
craw
-0.73
dece
-0.71
tram
-0.70
triv
-0.69
bundled
-0.69
larvae
-0.68
princ
-0.68
aggreg
-0.67
POSITIVE LOGITS
Dear
1.29
Former
1.23
SAN
1.21
LOS
1.20
WASHINGTON
1.20
Published
1.18
When
1.15
If
1.14
Whether
1.14
TOR
1.14
Activations Density 0.208%