INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    eners
    0.73
    Ano
    0.67
    0.65
     durations
    0.65
     जानता
    0.65
     भएको
    0.63
    优先
    0.62
     dagar
    0.62
    0.62
     வான
    0.62
    POSITIVE LOGITS
     down
    1.02
     डाउन
    0.95
    down
    0.91
     à
    0.87
     Down
    0.85
    Down
    0.74
    0.71
     next
    0.70
     ao
    0.70
     À
    0.65
    Act Density 0.091%

    No Known Activations