INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.09
    2:0.08
    3:0.08
    4:0.09
    5:0.07
    6:0.07
    7:0.07
    8:0.08
    9:0.08
    10:0.08
    11:0.08
    Negative Logits
    -2.88
     Tok
    -2.61
     Dek
    -2.52
     Dys
    -2.48
     trope
    -2.46
     bourgeoisie
    -2.43
    ilogy
    -2.41
    usterity
    -2.40
    -2.40
     privatization
    -2.36
    POSITIVE LOGITS
    ennis
    2.62
    "],"
    2.47
    pit
    2.46
    ulent
    2.42
     oblig
    2.30
    compliance
    2.28
    aceous
    2.25
    esm
    2.22
    inf
    2.16
    headers
    2.15
    Act Density 0.000%

    No Known Activations