INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     reluct
    -0.77
    ailability
    -0.67
     misunderstanding
    -0.67
    å§«
    -0.65
     misunder
    -0.64
    cules
    -0.64
     Mans
    -0.61
     Palestin
    -0.60
    Downloadha
    -0.58
    nces
    -0.58
    POSITIVE LOGITS
    lining
    1.41
    lined
    1.14
    liner
    1.07
    line
    1.03
    lines
    0.99
    ers
    0.93
    stream
    0.91
    liners
    0.88
    atcher
    0.87
    flow
    0.84
    Act Density 0.021%

    No Known Activations