INDEX
Explanations
descriptions of political events and movements
New Auto-Interp
Negative Logits
ãĥ¥
-0.71
worldly
-0.70
Russ
-0.63
âĶĢâĶĢâĶĢâĶĢ
-0.60
zn
-0.59
ARY
-0.59
Interested
-0.59
ize
-0.58
ARCH
-0.58
Apps
-0.58
POSITIVE LOGITS
joins
0.90
withdrew
0.82
denies
0.82
welcomed
0.79
wrote
0.77
was
0.77
teaches
0.77
welcomes
0.77
greets
0.76
urged
0.76
Activations Density 1.364%