INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     not
    0.64
     it
    0.57
     at
    0.55
     specifically
    0.55
     нужно
    0.54
     только
    0.51
     only
    0.51
     أن
    0.49
     plutôt
    0.49
     almost
    0.48
    POSITIVE LOGITS
     등이
    1.04
    etc
    1.04
     وغيرها
    0.99
     इत्यादी
    0.93
     등의
    0.93
     등으로
    0.92
     ইত্যাদি
    0.92
     그리고
    0.89
     등을
    0.87
     প্রভৃতি
    0.86
    Act Density 4.927%

    No Known Activations