INDEX
    Explanations

    phrases related to admitting something or self-awareness

    expressions of acknowledgment or admission of difficult truths

    New Auto-Interp
    Negative Logits
    ItemTracker
    -0.81
     srf
    -0.76
     Unloaded
    -0.66
    ammy
    -0.64
    zzi
    -0.62
    ibaba
    -0.62
     attendant
    -0.61
    ums
    -0.61
     quint
    -0.61
     Nanto
    -0.60
    POSITIVE LOGITS
    enance
    0.94
    cliffe
    0.86
    ively
    0.84
    uate
    0.84
    ible
    0.81
    lled
    0.75
     anything
    0.73
    ibly
    0.73
    able
    0.71
    rist
    0.70
    Act Density 0.270%

    No Known Activations