INDEX
Explanations
mentions of government or institutional states
New Auto-Interp
Negative Logits
thon
-0.20
teenth
-0.18
razione
-0.17
rung
-0.16
thes
-0.15
atti
-0.15
Ùĩ
-0.15
edd
-0.15
ding
-0.15
rij
-0.15
POSITIVE LOGITS
Unidos
0.19
craft
0.19
boro
0.18
/local
0.18
wide
0.17
-issue
0.17
-wide
0.17
urdy
0.17
653
0.16
bound
0.16
Activations Density 0.102%