INDEX
    Explanations

    defining or elaborating on issues

    New Auto-Interp
    Negative Logits
    1.16
    1.15
    ని
    1.05
    town
    1.05
    jší
    1.05
     it
    1.03
    ır
    1.01
     be
    1.00
    table
    0.96
    ться
    0.96
    POSITIVE LOGITS
    0
    1.49
    ع
    1.35
    al
    1.34
    Т
    1.16
    b
    1.15
     gutes
    1.10
    ad
    1.09
    v
    1.09
    1.07
    ور
    1.06
    Act Density 0.032%

    No Known Activations