INDEX
    Explanations

    questions asking "why" or "what"

    New Auto-Interp
    Negative Logits
    t
    0.67
    Benzoimidazol
    0.63
     Dionys
    0.60
     Chaplin
    0.59
    MyHomePage
    0.59
    ת
    0.58
     Yarm
    0.58
     vutto
    0.57
    Gambar
    0.57
    이지만
    0.56
    POSITIVE LOGITS
    ز
    0.63
    多い
    0.62
     कहता
    0.61
    potential
    0.58
    ٢
    0.57
    policy
    0.57
    }/>
    0.57
    ir
    0.57
    und
    0.56
    un
    0.55
    Act Density 0.002%

    No Known Activations