INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.04
    2:0.08
    3:0.09
    4:0.08
    5:0.07
    6:0.08
    7:0.08
    8:0.08
    9:0.07
    10:0.10
    11:0.09
    Negative Logits
     CONTR
    -2.10
     STORY
    -1.88
     STATS
    -1.81
     CLIENT
    -1.80
    ENDED
    -1.70
     SERVICES
    -1.68
     Editorial
    -1.66
    PATH
    -1.63
    Reviewer
    -1.62
     LIFE
    -1.60
    POSITIVE LOGITS
    auga
    1.63
     surprise
    1.54
    lyak
    1.54
     disguised
    1.53
    atz
    1.52
    thia
    1.50
     Qin
    1.50
    atus
    1.49
    pse
    1.48
    dq
    1.48
    Act Density 0.000%

    No Known Activations