INDEX
    Explanations

    references to "direction" or related terms

    New Auto-Interp
    Negative Logits
    ediator
    -0.17
    endale
    -0.16
    ниÑĩеÑģ
    -0.15
    arend
    -0.15
    евиÑĩ
    -0.15
    edo
    -0.15
    ardy
    -0.15
    ê
    -0.14
    nda
    -0.14
    erman
    -0.14
    POSITIVE LOGITS
    ally
    0.18
    ality
    0.17
    yes
    0.16
    -thinking
    0.15
    atty
    0.15
    749
    0.15
    (direction
    0.15
     direction
    0.15
     toward
    0.15
    nings
    0.15
    Act Density 0.077%

    No Known Activations