INDEX
    Explanations

    words related to the word "don't."

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.05
    2:0.09
    3:0.07
    4:0.08
    5:0.09
    6:0.08
    7:0.08
    8:0.07
    9:0.08
    10:0.10
    11:0.10
    Negative Logits
     surv
    -1.79
     rewrite
    -1.63
     サーティワン
    -1.61
     survives
    -1.59
    metics
    -1.59
     CoC
    -1.58
    ertation
    -1.48
    ividual
    -1.45
    hyde
    -1.44
     discipline
    -1.44
    POSITIVE LOGITS
     Spit
    1.81
    Friend
    1.72
    inav
    1.64
    DK
    1.63
     abroad
    1.61
    nings
    1.57
     Ambassador
    1.55
    INTON
    1.55
    gin
    1.54
    reb
    1.52
    Act Density 0.000%

    No Known Activations