INDEX
Explanations
terms related to political parties and their actions
New Auto-Interp
Negative Logits
]='\
-0.64
]+=
-0.59
אשר
-0.58
mıştır
-0.57
darstellt
-0.56
maktadır
-0.55
observable
-0.52
"]="
-0.51
おり
-0.51
日至
-0.50
POSITIVE LOGITS
isn
1.39
didn
1.37
aren
1.36
wasn
1.35
wouldn
1.29
shouldn
1.29
ain
1.27
weren
1.22
really
1.21
doesn
1.20
Activations Density 0.514%