INDEX
Explanations
mention of specific organizations, places, or groups
Tokens after publication titles or dates
news articles with dates
New Auto-Interp
Negative Logits
GEBURTSDATUM
-0.71
Commencez
-0.67
autorytatywna
-0.58
〒
-0.56
aidé
-0.55
Привет
-0.54
الإنجليزية
-0.53
kezés
-0.53
فريبيس
-0.52
Уважаемые
-0.52
POSITIVE LOGITS
news
1.08
news
0.90
article
0.88
News
0.86
headlines
0.84
News
0.84
Reporters
0.82
article
0.81
journalists
0.76
notícia
0.76
Activations Density 0.151%