INDEX
    Explanations

    instances of emphasis or punctuation used to convey strong feelings or reactions

    New Auto-Interp
    Negative Logits
    ãģĹãģı
    -0.16
    --------------------
    -0.16
    --
    -0.16
    ----------------
    -0.15
    iny
    -0.15
    न
    -0.14
    oud
    -0.14
    iyel
    -0.14
    ------------------------------------------------
    -0.14
    we
    -0.14
    POSITIVE LOGITS
    ————————————————
    0.33
    ————————
    0.30
    ————
    0.25
    ÂĿ
    0.21
    /+
    0.17
    ãģĦãģŁ
    0.17
    ãĤĪãģĨãģª
    0.15
    ir
    0.15
    stant
    0.15
    /-
    0.14
    Act Density 0.076%

    No Known Activations