INDEX
    Explanations

    statements reflecting personal growth and self-improvement

    New Auto-Interp
    Negative Logits
     perhaps
    -0.19
     terribly
    -0.17
    perhaps
    -0.17
     folks
    -0.16
     sort
    -0.16
     Perhaps
    -0.15
     incredibly
    -0.15
     terrific
    -0.15
    Perhaps
    -0.15
    sort
    -0.15
    POSITIVE LOGITS
     kli
    0.16
     doub
    0.15
    .scalablytyped
    0.15
    cház
    0.15
    AdapterManager
    0.14
    URITY
    0.14
    lef
    0.14
     बस
    0.14
     kdyby
    0.14
     —↵↵
    0.13
    Act Density 0.148%

    No Known Activations