INDEX
    Explanations

    phrases related to honesty and self-reflection

    New Auto-Interp
    Negative Logits
     bay
    -0.16
     Flem
    -0.15
    omik
    -0.15
    stant
    -0.15
    ukan
    -0.14
    avia
    -0.14
    оваÑĢи
    -0.14
     Fleming
    -0.14
    ãĥ¡ãĥ©
    -0.14
    hoe
    -0.14
    POSITIVE LOGITS
    enthal
    0.18
    ologne
    0.16
    abler
    0.15
    andas
    0.14
     Strauss
    0.14
     chick
    0.14
    gov
    0.14
    umbo
    0.14
    nist
    0.13
     generado
    0.13
    Act Density 0.348%

    No Known Activations