INDEX
Explanations
mentions or abbreviations related to news sources or organizations
references to national security
New Auto-Interp
Negative Logits
fare
-0.73
Äĩ
-0.72
======
-0.65
zona
-0.62
thumbnails
-0.62
verages
-0.61
mats
-0.60
ments
-0.60
otherwise
-0.60
ça
-0.59
POSITIVE LOGITS
FW
1.26
fw
1.12
ertodd
1.01
ettings
0.85
erve
0.82
lé
0.82
erver
0.81
daq
0.81
ensitive
0.81
ession
0.80
Activations Density 0.018%