INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ItemBackground
    -0.69
     XNUMX
    -0.68
     PCP
    -0.59
     Platon
    -0.57
     snippetHide
    -0.55
    )))),
    -0.54
     Sermons
    -0.54
     Stare
    -0.54
     imgur
    -0.54
     considérons
    -0.54
    POSITIVE LOGITS
     the
    0.70
     us
    0.64
     me
    0.59
     him
    0.56
     a
    0.54
     our
    0.52
     your
    0.51
     away
    0.49
     his
    0.48
    [toxicity=0]
    0.47
    Act Density 0.292%

    No Known Activations