INDEX
    Explanations

    English fake news datasets

    New Auto-Interp
    Negative Logits
     Inequality
    0.49
     inequality
    0.46
     Inequal
    0.46
    ˁ
    0.46
    0.45
     중요
    0.44
     SUMMARY
    0.44
     나타
    0.43
    0.43
    s
    0.43
    POSITIVE LOGITS
     streets
    0.47
    et
    0.45
     enclave
    0.44
    ut
    0.43
     habituellement
    0.43
    ikke
    0.43
    ul
    0.42
    ider
    0.42
    ائج
    0.42
     tabella
    0.41
    Act Density 0.000%

    No Known Activations