INDEX
    Explanations

    statements or phrases that introduce generalizations or overarching comments

    New Auto-Interp
    Negative Logits
    hey
    -0.15
    ary
    -0.15
    ors
    -0.15
    ourt
    -0.15
    ort
    -0.15
    ours
    -0.14
    ids
    -0.14
    ÑģÑı
    -0.14
    ka
    -0.14
    ed
    -0.14
    POSITIVE LOGITS
     speaking
    0.32
    -purpose
    0.31
    -speaking
    0.26
    ised
    0.26
    -ÑĤо
    0.21
    mente
    0.21
     Speaking
    0.20
    izations
    0.20
    Speaking
    0.20
    ìłģìĿ¸
    0.20
    Act Density 0.025%

    No Known Activations