INDEX
    Explanations

    instances of the word "show" and its variations, indicating a focus on demonstration or presentation

    New Auto-Interp
    Negative Logits
    İ
    -0.15
    pch
    -0.14
    ucken
    -0.14
    landır
    -0.14
    ilogy
    -0.13
    essel
    -0.13
    leared
    -0.13
    nish
    -0.13
    arih
    -0.13
    ará
    -0.13
    POSITIVE LOGITS
     signs
    0.35
     how
    0.34
     off
    0.32
    -off
    0.29
    boat
    0.28
     up
    0.28
     why
    0.28
     Signs
    0.27
    off
    0.26
    -case
    0.26
    Act Density 0.095%

    No Known Activations