INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     only
    -1.32
     to
    -1.25
     other
    -1.04
     in
    -0.94
     forklar
    -0.94
     well
    -0.94
     vanske
    -0.91
     during
    -0.90
     inorder
    -0.88
     within
    -0.87
    POSITIVE LOGITS
     for
    1.61
    again
    1.41
     profusely
    1.37
     again
    1.32
    Again
    1.27
     مرة
    1.20
     tekrar
    1.20
     sincerely
    1.12
    Thank
    1.08
     sekali
    1.05
    Act Density 0.016%

    No Known Activations