INDEX
    Explanations

    drop followed by specific word

    New Auto-Interp
    Negative Logits
    ্যাস
    0.43
    উপ
    0.41
     पॉ
    0.41
     UP
    0.40
     Smell
    0.40
     обра
    0.40
     stamp
    0.39
     ಅನು
    0.39
     सिरे
    0.38
     Poh
    0.38
    POSITIVE LOGITS
    drop
    1.39
     Drop
    1.38
     drop
    1.33
    Drop
    1.29
    drops
    1.23
     drops
    1.23
     Dro
    1.19
    Dro
    1.18
     dropped
    1.11
     dropping
    1.10
    Act Density 0.008%

    No Known Activations