INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     a
    0.79
     refrigerators
    0.77
    रा
    0.74
    :
    0.72
     bedrooms
    0.68
     landscapes
    0.67
     families
    0.66
     caravans
    0.66
     t
    0.64
     backpacks
    0.64
    POSITIVE LOGITS
    chk
    0.71
    ósz
    0.71
    vorm
    0.70
    നിര
    0.70
    bai
    0.68
     ब्लाइंड
    0.66
    mixer
    0.62
    conn
    0.61
    ukone
    0.61
     ويع
    0.61
    Act Density 0.013%

    No Known Activations