INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ویکی‌پدی
    -0.76
    #
    -0.68
     kasarigan
    -0.67
    /***/
    -0.55
    ="@+
    -0.51
     thăng
    -0.50
    šana
    -0.50
    illoma
    -0.49
    іга
    -0.48
    bardier
    -0.48
    POSITIVE LOGITS
     mouse
    1.02
     Mouse
    1.02
     Mice
    0.98
     mice
    0.98
    mouse
    0.97
     rodents
    0.95
     MOUSE
    0.94
    🐁
    0.90
    🐭
    0.88
     rodent
    0.87
    Act Density 0.183%

    No Known Activations