INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     separat
    1.08
     ACM
    0.97
    0.96
     MNR
    0.96
     ACLU
    0.95
     DMA
    0.94
     LWR
    0.94
     ERC
    0.93
     AMA
    0.93
     Cavaliers
    0.92
    POSITIVE LOGITS
    ف
    1.20
    ри
    0.96
    ди
    0.95
    т
    0.89
    ран
    0.87
    де
    0.86
    0.85
    بر
    0.84
    0.84
    ری
    0.83
    Act Density 0.000%

    No Known Activations