INDEX
Explanations
words related to criticism and public opinion
New Auto-Interp
Negative Logits
akin
-0.15
.gov
-0.15
anyahu
-0.14
ervas
-0.14
aris
-0.14
ategy
-0.14
ekim
-0.14
ourcem
-0.13
orem
-0.13
issen
-0.13
POSITIVE LOGITS
critics
0.28
opponents
0.27
detr
0.27
oppon
0.26
academics
0.25
experts
0.25
some
0.24
prominent
0.24
groups
0.24
advocacy
0.24
Activations Density 0.639%