INDEX
Explanations
references to political parties and their affiliations
New Auto-Interp
Negative Logits
ister
-0.19
Bis
-0.18
isters
-0.16
het
-0.16
ri
-0.16
emer
-0.15
ken
-0.14
ij
-0.14
bane
-0.14
enberg
-0.14
POSITIVE LOGITS
ãģ°
0.15
Qué
0.15
огÑĢам
0.14
ammable
0.14
-sponsored
0.14
ament
0.14
vertisement
0.13
uko
0.13
HeaderValue
0.13
assium
0.13
Activations Density 0.014%