INDEX
    Explanations

    the word "so" used in various contexts

    New Auto-Interp
    Negative Logits
    roman
    -0.20
     so
    -0.19
    ting
    -0.19
    work
    -0.17
    phant
    -0.17
    ature
    -0.16
    b
    -0.15
    uckle
    -0.15
    rt
    -0.15
    un
    -0.15
    POSITIVE LOGITS
    -called
    0.40
    ooo
    0.26
    oooo
    0.25
    ething
    0.24
    apy
    0.23
    iled
    0.23
    oth
    0.22
    oner
    0.21
    oooooooo
    0.21
    aping
    0.19
    Act Density 0.038%

    No Known Activations