INDEX
    Explanations

    references to guidance and leading influence in narratives

    New Auto-Interp
    Negative Logits
    elman
    -0.17
    going
    -0.16
    à¹Įà¹Ģà¸ŀ
    -0.15
    for
    -0.15
    tails
    -0.15
    _scalar
    -0.15
    ระà¸Ķ
    -0.14
    Ac
    -0.14
    dik
    -0.14
    ź
    -0.14
    POSITIVE LOGITS
     into
    0.31
     toward
    0.27
     towards
    0.25
     away
    0.25
     astr
    0.24
     Into
    0.24
    into
    0.22
     Away
    0.21
     INTO
    0.20
     onto
    0.19
    Act Density 0.068%

    No Known Activations