INDEX
    Explanations

    instances of high activation, suggesting an emphasis on key points in discussions or texts

    New Auto-Interp
    Negative Logits
    atile
    -0.18
    oe
    -0.16
    ye
    -0.16
    ya
    -0.15
    ARNING
    -0.15
     Ay
    -0.14
    LATED
    -0.14
     pdu
    -0.14
    Äĩe
    -0.14
    yyy
    -0.14
    POSITIVE LOGITS
    boat
    0.16
    apus
    0.16
    ĥ
    0.15
    elocity
    0.15
    uzz
    0.15
    atego
    0.15
    urdy
    0.14
    ãĥ¼ãĥģ
    0.14
    ancode
    0.14
    egasus
    0.14
    Act Density 0.199%

    No Known Activations