INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    utral
    0.44
    ρισ
    0.42
    kie
    0.38
     এটাই
    0.37
    cta
    0.37
    СС
    0.37
    жень
    0.36
     zagro
    0.36
    requested
    0.35
     solely
    0.35
    POSITIVE LOGITS
     deviate
    0.56
    0.56
     berubah
    0.53
     fluctuate
    0.51
     вари
    0.50
    実際
    0.50
     fluctuates
    0.49
     changed
    0.49
     değiş
    0.48
    0.48
    Act Density 0.003%

    No Known Activations