INDEX
Explanations
references to people, appointments, and agreements in a political context
New Auto-Interp
Negative Logits
yna
-0.15
vala
-0.15
ISIBLE
-0.15
itel
-0.15
wf
-0.15
Vir
-0.14
mf
-0.14
Winds
-0.14
yd
-0.13
endir
-0.13
POSITIVE LOGITS
.BLL
0.18
jenter
0.15
offending
0.14
िथ
0.13
ÏĦÏīν
0.13
raÄį
0.13
parten
0.13
å®
0.13
Ń
0.13
rane
0.12
Activations Density 0.035%