INDEX
Explanations
questions concerning societal structures and the dynamics within them
New Auto-Interp
Negative Logits
Sort
-0.15
Make
-0.15
CA
-0.14
ity
-0.14
Get
-0.14
Speak
-0.14
g
-0.14
eless
-0.14
iya
-0.14
Speak
-0.14
POSITIVE LOGITS
how
0.29
How
0.27
what
0.27
Does
0.26
-how
0.23
.Does
0.23
What
0.22
why
0.21
How
0.21
Does
0.21
Activations Density 0.097%