INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Please
    -0.37
    Please
    -0.36
     quite
    -0.36
     unfortunately
    -0.36
     cute
    -0.36
    @@
    -0.35
     Pic
    -0.35
     Quite
    -0.35
     Geme
    -0.35
     interaction
    -0.34
    POSITIVE LOGITS
     thankful
    0.80
     grateful
    0.78
     thankfully
    0.67
    Thankfully
    0.67
    Fortunately
    0.66
    rateful
    0.63
    ftagPool
    0.62
     append
    0.61
    Nutrient
    0.61
     glad
    0.61
    Act Density 0.033%

    No Known Activations