INDEX
Explanations
terms related to political and social issues, including government statements, public opinions, and political events
New Auto-Interp
Negative Logits
disadvant
-0.92
mathemat
-0.84
incorpor
-0.83
lawy
-0.79
prolifer
-0.79
vulner
-0.78
levers
-0.78
carbohyd
-0.78
misunder
-0.77
promoters
-0.75
POSITIVE LOGITS
ï¸ı
1.67
女
1.00
âĶĢâĶĢ
0.99
ï¸
0.97
âĻ
0.97
Balt
0.95
âĸ
0.94
£
0.92
âĶĢâĶĢâĶĢâĶĢ
0.91
âĹ
0.90
Activations Density 0.626%