INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.81
    i
    0.72
    and
    0.69
    י
    0.64
    ای
    0.63
    л
    0.58
    u
    0.55
    0.54
     for
    0.54
    ت
    0.53
    POSITIVE LOGITS
     trivia
    0.46
     вспо
    0.45
     reprim
    0.44
     decades
    0.43
    底层
    0.42
     Jahrze
    0.42
     itinerant
    0.42
     annih
    0.42
     adolesc
    0.41
     jaren
    0.41
    Act Density 0.672%

    No Known Activations