INDEX
    Explanations

    words indicating contribution or involvement in various contexts

    New Auto-Interp
    Negative Logits
    arp
    -0.17
    laces
    -0.17
    lace
    -0.15
    ifier
    -0.15
    ify
    -0.15
    oton
    -0.15
    arm
    -0.15
    lac
    -0.15
    el
    -0.14
    undy
    -0.14
    POSITIVE LOGITS
     towards
    0.23
     toward
    0.23
    utory
    0.20
     Towards
    0.19
    Towards
    0.18
    uting
    0.18
     factors
    0.17
    icut
    0.17
     Tow
    0.17
    ally
    0.16
    Act Density 0.024%

    No Known Activations