INDEX
    Explanations

    so/very followed by descriptive word

    New Auto-Interp
    Negative Logits
    و
    0.41
    ل
    0.38
    ور
    0.37
    ين
    0.35
     expts
    0.34
    ك
    0.34
     codons
    0.33
    0.33
    чках
    0.33
    수에
    0.33
    POSITIVE LOGITS
     was
    0.43
    to
    0.43
     to
    0.41
     
    0.36
    tn
    0.36
     pada
    0.36
    ton
    0.35
     \
    0.35
    ti
    0.34
    ts
    0.33
    Act Density 0.292%

    No Known Activations