INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.13
    3:0.13
    4:0.11
    5:0.03
    6:0.05
    7:0.12
    8:0.04
    9:0.06
    10:0.07
    11:0.16
    Negative Logits
    Leaks
    -1.33
    milo
    -1.25
     Kardash
    -1.23
     Labrador
    -1.22
     encourages
    -1.18
     hides
    -1.18
     refers
    -1.17
     hump
    -1.16
    han
    -1.15
    graph
    -1.14
    POSITIVE LOGITS
    ��極
    1.53
    omore
    1.49
    eers
    1.48
    pole
    1.46
    bj
    1.41
    atana
    1.37
    �士
    1.34
    endo
    1.33
    ��
    1.32
     Ips
    1.28
    Act Density 0.001%

    No Known Activations