INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     question
    -0.22
    ibern
    -0.18
     questioned
    -0.16
     Question
    -0.16
     however
    -0.16
    onders
    -0.16
    question
    -0.16
     frag
    -0.16
    -question
    -0.15
     However
    -0.15
    POSITIVE LOGITS
    aidu
    0.17
     otherwise
    0.16
    andro
    0.16
     OTHERWISE
    0.15
    åħ·ä½ĵ
    0.15
     yoksa
    0.15
    TestFixture
    0.15
    ynom
    0.15
    ä½
    0.15
    Va
    0.15
    Act Density 0.108%

    No Known Activations