INDEX
    Explanations

    phrases that express relationships or affiliations, particularly focusing on the word "of"

    New Auto-Interp
    Negative Logits
     slope
    -0.06
    ina
    -0.06
    535
    -0.06
     sheer
    -0.06
     with
    -0.06
    tree
    -0.06
     at
    -0.06
     by
    -0.06
    ilda
    -0.06
     Ñģвоими
    -0.05
    POSITIVE LOGITS
    chin
    0.08
    riter
    0.07
    cher
    0.07
     ÙħخصÙĪØµ
    0.07
    avl
    0.07
    atürk
    0.07
    Äįan
    0.07
    sonian
    0.07
    .cbo
    0.07
    Feat
    0.07
    Act Density 0.027%

    No Known Activations