INDEX
Explanations
phrases related to politics and government
New Auto-Interp
Negative Logits
bender
-0.78
puff
-0.76
wic
-0.73
yang
-0.69
wrap
-0.69
FU
-0.68
Levine
-0.68
quart
-0.67
tar
-0.66
Shapiro
-0.65
POSITIVE LOGITS
selves
1.32
own
1.20
ancestors
1.16
nation
1.04
selves
1.03
beloved
1.02
collective
1.01
hearts
1.01
asses
0.96
shores
0.95
Activations Density 0.126%