INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ur
    1.16
    in
    1.03
    ى
    0.88
    inę
    0.83
    anager
    0.79
    ння
    0.77
    ing
    0.74
    ч
    0.73
    urized
    0.73
    0.72
    POSITIVE LOGITS
     frogs
    1.11
     frog
    1.01
     Frog
    0.97
    Frog
    0.95
    0.85
     to
    0.84
    0.83
    N
    0.82
    0.81
     Ř
    0.80
    Act Density 0.007%

    No Known Activations