INDEX
    Explanations

    texts related to personal experiences or interests

    discussions about personal interests and experiences

    New Auto-Interp
    Negative Logits
     discredited
    -0.84
     outraged
    -0.79
     alleged
    -0.78
     repud
    -0.77
     accuser
    -0.77
     retract
    -0.76
     disputed
    -0.75
     equivalent
    -0.74
     retracted
    -0.74
     dismant
    -0.73
    POSITIVE LOGITS
    Favorite
    1.28
    Growing
    1.23
    Recently
    1.21
     Recently
    1.16
    Being
    1.12
    My
    1.11
     Growing
    1.10
     haha
    1.10
    Learning
    1.08
    Working
    1.07
    Act Density 0.402%

    No Known Activations