INDEX
    Explanations

    references to rugby players and their achievements

    New Auto-Interp
    Negative Logits
     pleaſure
    -0.72
     ftate
    -0.65
     houſe
    -0.64
     ſtate
    -0.63
    צלחה
    -0.63
    atürk
    -0.63
     reafon
    -0.63
     ſame
    -0.61
     DeWitt
    -0.59
     faſt
    -0.59
    POSITIVE LOGITS
     rugby
    1.22
     Rugby
    1.09
    Rugby
    1.07
    rugby
    0.99
    🏉
    0.86
    0.68
    arXiv
    0.63
    IndentedString
    0.63
    quelize
    0.58
     msglen
    0.57
    Act Density 0.021%

    No Known Activations