INDEX
    Explanations

    references to filtering methods and their efficiency

    New Auto-Interp
    Negative Logits
     wobec
    -0.39
    PasswordEncoder
    -0.35
    atchewan
    -0.33
     bijzonder
    -0.32
     fréquent
    -0.32
     marchandises
    -0.31
     kalangan
    -0.31
    specialchars
    -0.31
     ویکی‌پدی
    -0.30
     Angaben
    -0.29
    POSITIVE LOGITS
     filming
    0.98
     filters
    0.95
    Plot
    0.95
    Fil
    0.94
     Plot
    0.91
     filmed
    0.91
     filtr
    0.91
     Fil
    0.90
     Filters
    0.88
    plot
    0.85
    Act Density 0.301%

    No Known Activations