INDEX
    Explanations

    expressions of hope and optimism

    New Auto-Interp
    Negative Logits
    him
    -0.17
    them
    -0.16
    ivi
    -0.15
     herself
    -0.15
     lui
    -0.15
    rowning
    -0.14
    acro
    -0.13
    zs
    -0.13
    好ãģį
    -0.13
     himself
    -0.13
    POSITIVE LOGITS
     they
    0.33
    lessly
    0.33
     that
    0.32
     someday
    0.32
     it
    0.28
     we
    0.28
     to
    0.27
     this
    0.25
     there
    0.25
     others
    0.24
    Act Density 0.040%

    No Known Activations