INDEX
    Explanations

    instances of the word "those" and similar demonstrative pronouns

    New Auto-Interp
    Negative Logits
    ault
    -0.21
    å¯
    -0.15
    zell
    -0.15
    ancell
    -0.14
    tober
    -0.14
    ndon
    -0.14
    ags
    -0.14
    عات
    -0.14
    verse
    -0.14
    rix
    -0.13
    POSITIVE LOGITS
    akin
    0.15
    curity
    0.15
    laughter
    0.15
     PyTuple
    0.14
    pra
    0.14
    fst
    0.14
    opsy
    0.14
    657
    0.14
     beiden
    0.14
    alara
    0.14
    Act Density 0.128%

    No Known Activations