INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.71
     vainly
    -0.65
     strode
    -0.60
     mustered
    -0.58
     engendered
    -0.58
     symbolically
    -0.57
     darted
    -0.56
     leaped
    -0.56
     impelled
    -0.55
     leapt
    -0.54
    POSITIVE LOGITS
     Benjamin
    1.70
    Benjamin
    1.56
     Benjam
    1.16
     catég
    0.96
     Bén
    0.91
    Jenny
    0.88
     Jenny
    0.87
     fasc
    0.84
     polig
    0.83
     Kategor
    0.83
    Act Density 0.207%

    No Known Activations