INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    political
    -0.07
    .ob
    -0.07
    Pack
    -0.07
    _samples
    -0.06
    brains
    -0.06
    acker
    -0.06
     pit
    -0.06
     sob
    -0.06
     problems
    -0.06
     acid
    -0.06
    POSITIVE LOGITS
    ']])↵
    0.07
     Rounds
    0.07
    +"<
    0.07
    (style
    0.07
     '''
    ↵
    0.07
    ']:↵
    0.07
     "<
    0.07
     Nickel
    0.07
    נקודה
    0.07
    gments
    0.07
    Act Density 0.020%

    No Known Activations