INDEX
    Explanations

    clear explanations, good coverage

    New Auto-Interp
    Negative Logits
     ignorance
    0.49
     ignorant
    0.45
     간단
    0.42
     retweet
    0.41
     trivial
    0.39
    简单的
    0.39
     unknowingly
    0.39
     ignor
    0.39
    ใบ
    0.38
     सहयोग
    0.38
    POSITIVE LOGITS
     pedagogical
    0.76
     Coverage
    0.68
     treatments
    0.67
     pedagog
    0.66
     treatment
    0.65
     undergraduate
    0.65
     exposition
    0.64
     Treatments
    0.63
     Treatment
    0.62
    Coverage
    0.62
    Act Density 0.015%

    No Known Activations