INDEX
    Explanations

    praising you for exceptional performance

    New Auto-Interp
    Negative Logits
     adultery
    0.41
     hoping
    0.40
    udian
    0.39
     trying
    0.37
    0.37
    となります
    0.37
    0.37
    primarily
    0.36
     harap
    0.36
    carros
    0.36
    POSITIVE LOGITS
     skillfully
    0.81
     admirably
    0.78
     excellently
    0.76
     deserve
    0.75
     impressively
    0.71
     thoughtfully
    0.70
     successfully
    0.70
     beautifully
    0.70
     deserves
    0.69
    Successfully
    0.69
    Act Density 0.015%

    No Known Activations