INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.47
     D
    -0.44
    دام
    -0.44
    homa
    -0.43
    ONOM
    -0.40
     F
    -0.40
     B
    -0.39
    pan
    -0.39
     I
    -0.38
    cor
    -0.38
    POSITIVE LOGITS
     themſelves
    1.16
     Efq
    1.16
     leaſt
    1.14
     Anſ
    1.12
     myſelf
    1.12
     himſelf
    1.09
     Shakspeare
    1.06
     becauſe
    1.05
     whoſe
    1.04
     Majefty
    1.03
    Act Density 0.216%

    No Known Activations