INDEX
    Explanations

    personal pronouns and their usage in relationships and interactions

    New Auto-Interp
    Negative Logits
    allow
    -0.20
    åħģ
    -0.16
    cht
    -0.16
     Allow
    -0.16
    Allow
    -0.16
    permit
    -0.15
     allow
    -0.15
     Alone
    -0.15
     superClass
    -0.15
    ÅĻev
    -0.14
    POSITIVE LOGITS
     with
    0.31
     through
    0.28
     along
    0.25
     understand
    0.25
     towards
    0.25
     navigate
    0.24
    with
    0.23
     toward
    0.23
     avoid
    0.23
     stay
    0.22
    Act Density 0.068%

    No Known Activations