INDEX
Explanations
discussions around societal dilemmas and existential questions
New Auto-Interp
Negative Logits
Were
-0.17
Were
-0.16
nt
-0.13
_ctxt
-0.13
opposed
-0.12
fueron
-0.12
.undefined
-0.12
navr
-0.11
ieten
-0.11
.getBean
-0.11
POSITIVE LOGITS
is
0.64
has
0.58
isn
0.47
can
0.46
will
0.46
may
0.43
seems
0.42
appears
0.41
hasn
0.41
does
0.40
Activations Density 10.778%