INDEX
Explanations
concepts related to systemic structures and their implications within society
New Auto-Interp
Negative Logits
would
-0.19
may
-0.19
was
-0.18
cannot
-0.17
is
-0.17
avra
-0.16
must
-0.16
lerdi
-0.16
should
-0.16
ozilla
-0.16
POSITIVE LOGITS
ever
0.31
be
0.30
EVER
0.21
Ever
0.21
been
0.21
necessarily
0.19
Ever
0.18
Really
0.17
ever
0.17
REALLY
0.16
Activations Density 0.108%