INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.44
     Efq
    -1.35
    providedIn
    -1.32
     Roskov
    -1.32
     Monfieur
    -1.29
     pleaſure
    -1.27
     becauſe
    -1.23
     himſelf
    -1.22
     ſtate
    -1.21
     againſt
    -1.20
    POSITIVE LOGITS
     New
    0.55
     he
    0.52
    0.50
    0.48
     in
    0.47
     "
    0.47
     ne
    0.47
     idea
    0.46
     home
    0.46
     la
    0.45
    Act Density 1.316%

    No Known Activations