INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     is
    1.20
     (
    0.96
     are
    0.91
    UT
    0.91
    lt
    0.89
    ätz
    0.89
     shelters
    0.88
     impeding
    0.83
    üren
    0.82
    ltal
    0.82
    POSITIVE LOGITS
    ו
    1.63
    in
    1.52
    و
    1.39
    1.26
    1.26
    1.26
    1.23
    มัน
    1.21
    1.17
    v
    1.15
    Act Density 0.270%

    No Known Activations