INDEX
    Explanations

    Against the odds

    New Auto-Interp
    Negative Logits
     
    -0.08
     a
    -0.08
    =
    -0.08
     I
    -0.08
     an
    -0.07
     simplified
    -0.07
     HAPPY
    -0.07
     express
    -0.07
    -service
    -0.07
     =
    -0.07
    POSITIVE LOGITS
     skepticism
    0.14
     despair
    0.14
     skeptic
    0.14
     skeptical
    0.12
     hopeless
    0.12
     skept
    0.12
     pessim
    0.12
     preconce
    0.11
     والي
    0.11
     scept
    0.11
    Act Density 0.141%

    No Known Activations