INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ^(@)
    -0.99
     faſt
    -0.98
     myſelf
    -0.98
     Efq
    -0.93
     محفوظة
    -0.91
     Jefus
    -0.90
     fevere
    -0.89
    ſelves
    -0.88
     Majefty
    -0.88
     iſt
    -0.87
    POSITIVE LOGITS
     Gre
    0.43
     den
    0.43
     di
    0.43
    زع
    0.42
     مايو
    0.42
     Gran
    0.41
    -
    0.41
     Let
    0.41
    ,
    0.40
     so
    0.40
    Act Density 0.107%

    No Known Activations