INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.08
    3:0.10
    4:0.09
    5:0.07
    6:0.09
    7:0.09
    8:0.07
    9:0.08
    10:0.08
    11:0.07
    Negative Logits
    nda
    -2.09
    ovo
    -1.86
    dot
    -1.79
    liv
    -1.76
    contin
    -1.75
    zb
    -1.72
     Sched
    -1.71
    ubs
    -1.71
    rone
    -1.70
    ni
    -1.69
    POSITIVE LOGITS
     educate
    1.92
     immersed
    1.86
     architect
    1.73
     educated
    1.71
     philanthrop
    1.67
     devote
    1.67
     firsthand
    1.65
     careful
    1.63
     leap
    1.63
     educating
    1.63
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.