INDEX
Explanations
references to political party dynamics and hypocrisy
New Auto-Interp
Negative Logits
Wich
-0.17
Ïĩι
-0.16
ë§¥
-0.15
amax
-0.14
å·±
-0.14
kate
-0.14
Ø´Ùĩ
-0.14
ilians
-0.14
rones
-0.13
anela
-0.13
POSITIVE LOGITS
themselves
0.18
antu
0.16
ition
0.15
ody
0.14
Mean
0.14
mans
0.14
princip
0.14
asso
0.13
iores
0.13
NT
0.13
Activations Density 0.082%