INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    1.02
    ad
    1.01
    d
    0.92
    a
    0.90
    il
    0.86
    ar
    0.84
    re
    0.82
    h
    0.80
    in
    0.80
    f
    0.77
    POSITIVE LOGITS
     и
    0.80
     و
    0.77
     мани
    0.64
    0.61
    ിയ
    0.61
    0.61
    0.60
    0.60
     а
    0.59
     ПО
    0.59
    Act Density 0.000%

    No Known Activations