INDEX
    Explanations

    auxiliary verbs/pronouns

    New Auto-Interp
    Negative Logits
     prisons
    -0.07
    fiction
    -0.06
     Cheng
    -0.06
    ugeot
    -0.06
     lor
    -0.06
    Identification
    -0.06
     STYLE
    -0.06
     photoc
    -0.06
     SHR
    -0.06
     whether
    -0.06
    POSITIVE LOGITS
    0.07
     بت
    0.07
     l
    0.07
    ';↵↵↵
    0.07
     demok
    0.06
     gehen
    0.06
     คน
    0.06
     celou
    0.06
     d
    0.06
    GG
    0.06
    Act Density 0.076%

    No Known Activations