INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.07
    2:0.06
    3:0.08
    4:0.09
    5:0.09
    6:0.07
    7:0.09
    8:0.09
    9:0.08
    10:0.07
    11:0.07
    Negative Logits
     discern
    -1.99
     navigate
    -1.97
     alike
    -1.96
    ascript
    -1.95
     enabled
    -1.94
     surn
    -1.89
     therefore
    -1.88
     wielding
    -1.84
     prompted
    -1.81
     intending
    -1.80
    POSITIVE LOGITS
    Metal
    2.19
    MET
    2.04
     fertilizer
    1.99
    Toy
    1.93
    reb
    1.92
    Cro
    1.91
    ł
    1.90
     peanuts
    1.87
     Soda
    1.84
    ycle
    1.84
    Act Density 0.000%

    No Known Activations