INDEX
    Explanations

    greetings or salutations in the text

    New Auto-Interp
    Negative Logits
    arra
    -0.16
    nte
    -0.15
    åύ
    -0.15
    éĢļ
    -0.14
    οÏħÏĤ
    -0.13
    .tv
    -0.13
    furt
    -0.13
     parts
    -0.13
    ube
    -0.13
    ters
    -0.13
    POSITIVE LOGITS
    ooo
    0.26
     everyone
    0.24
    oooo
    0.23
    oo
    0.23
     everybody
    0.23
    oooooooo
    0.22
     Everyone
    0.22
     Kitty
    0.22
    everyone
    0.20
    _world
    0.20
    Act Density 0.018%

    No Known Activations