INDEX
    Explanations

    words that introduce relative clauses

    New Auto-Interp
    Negative Logits
     ext
    -0.16
    atura
    -0.15
    ropa
    -0.15
    /layouts
    -0.15
    iya
    -0.15
    readcr
    -0.15
     Morr
    -0.15
     genu
    -0.14
    baum
    -0.14
    ckett
    -0.14
    POSITIVE LOGITS
    ung
    0.15
    ÑĤаб
    0.14
     dziew
    0.14
    SWG
    0.14
     bü
    0.14
    elect
    0.14
    maal
    0.13
    eless
    0.13
    ãĥªãĥ¼ãĤº
    0.13
     ç¼
    0.13
    Act Density 0.009%

    No Known Activations