INDEX
    Explanations

    phrases introducing examples or instances, often indicated by the word "for"

    New Auto-Interp
    Negative Logits
    inding
    -0.17
    imals
    -0.15
    ãĥĭãĥĥãĤ¯
    -0.15
    entiful
    -0.15
    ãĥ¼ãĥ«ãĥī
    -0.14
     سخ
    -0.14
    NSNotification
    -0.14
    vang
    -0.14
     kostenlose
    -0.14
    ÃŃm
    -0.14
    POSITIVE LOGITS
    oday
    0.18
    unately
    0.17
    ستر
    0.16
     reasons
    0.15
    rant
    0.15
    اجات
    0.15
    reason
    0.14
     instance
    0.14
     reason
    0.14
     us
    0.14
    Act Density 0.078%

    No Known Activations