INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.14
    1:0.04
    2:0.02
    3:0.11
    4:0.08
    5:0.08
    6:0.12
    7:0.03
    8:0.20
    9:0.08
    10:0.02
    11:0.03
    Negative Logits
     motion
    -1.91
    lihood
    -1.83
     pse
    -1.80
    -1.69
    tsky
    -1.67
     perjury
    -1.67
     etc
    -1.65
     Lect
    -1.62
     fallacy
    -1.62
     Rothschild
    -1.61
    POSITIVE LOGITS
    ��
    1.96
    1.93
    ドラ
    1.89
    reddit
    1.83
    WithNo
    1.82
    Mini
    1.82
    Reviewed
    1.82
    1.81
    scrib
    1.80
    1.80
    Act Density 0.000%

    No Known Activations