INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    effort
    -0.94
     pleaſure
    -0.82
     delight
    -0.81
     ſtate
    -0.75
     effort
    -0.75
     Effort
    -0.74
     raiſ
    -0.73
    Effort
    -0.72
     defire
    -0.69
    efforts
    -0.69
    POSITIVE LOGITS
    ing
    0.69
    ful
    0.64
    ings
    0.63
    migrationBuilder
    0.57
    en
    0.54
    ant
    0.53
    way
    0.53
     Wicidata
    0.50
    fic
    0.50
    fi
    0.50
    Act Density 1.542%

    No Known Activations