INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     entitled
    -0.07
     bumped
    -0.06
    .ALIGN
    -0.06
     asleep
    -0.06
    -0.06
     making
    -0.06
     kind
    -0.06
    'aff
    -0.06
     spirited
    -0.06
    rp
    -0.06
    POSITIVE LOGITS
    :
    0.08
    .Inner
    0.07
    gio
    0.07
     ''↵↵
    0.07
    temps
    0.07
    ौल
    0.07
    .cycle
    0.07
    agues
    0.07
    erture
    0.06
    "group
    0.06
    Act Density 0.021%

    No Known Activations