INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ur
    1.32
    O
    1.29
    x
    1.16
    f
    1.01
    inas
    0.94
    r
    0.93
    v
    0.92
    ه
    0.92
    woven
    0.89
    gro
    0.86
    POSITIVE LOGITS
     złoż
    1.11
     suced
    1.09
     czasie
    1.05
     powied
    1.01
     chłop
    1.01
    to
    0.98
    '।
    0.98
    л
    0.97
     ajud
    0.96
     испыта
    0.95
    Act Density 0.003%

    No Known Activations