INDEX
Explanations
words related to choices or preferences
discussions around the interaction of various systems and the implications of choices within societal and governmental structures
New Auto-Interp
Negative Logits
estern
-0.67
oso
-0.64
NOT
-0.61
ãĥ´
-0.60
suspic
-0.58
NOT
-0.58
thous
-0.57
repre
-0.56
notwithstanding
-0.56
INCLUD
-0.53
POSITIVE LOGITS
itself
1.10
altogether
1.05
anymore
0.98
outright
0.91
themselves
0.86
nor
0.83
oneself
0.81
necessarily
0.78
himself
0.77
ourselves
0.75
Activations Density 0.554%