INDEX
    Explanations

    discussions around societal dilemmas and existential questions

    New Auto-Interp
    Negative Logits
    Were
    -0.17
     Were
    -0.16
    nt
    -0.13
    _ctxt
    -0.13
     opposed
    -0.12
     fueron
    -0.12
    .undefined
    -0.12
     navr
    -0.11
    ieten
    -0.11
    .getBean
    -0.11
    POSITIVE LOGITS
     is
    0.64
     has
    0.58
     isn
    0.47
     can
    0.46
     will
    0.46
     may
    0.43
     seems
    0.42
     appears
    0.41
     hasn
    0.41
     does
    0.40
    Act Density 10.778%

    No Known Activations