INDEX
    Explanations

    sentences expressing personal opinions or emotional reflections

    New Auto-Interp
    Negative Logits
    Luckily
    -0.17
     fortunately
    -0.15
     thankfully
    -0.15
    ãĥ³ãĤº
    -0.15
     Luckily
    -0.14
    Typed
    -0.14
     Dummy
    -0.14
    enerator
    -0.14
     luckily
    -0.14
    _compat
    -0.13
    POSITIVE LOGITS
     truth
    0.79
    truth
    0.62
     honest
    0.57
     Truth
    0.56
    Truth
    0.54
     honestly
    0.53
    _truth
    0.48
     honesty
    0.47
     verdad
    0.45
     truthful
    0.43
    Act Density 0.333%

    No Known Activations