INDEX
    Explanations

    phrases related to choices and voluntary actions

    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.01
    2:0.07
    3:0.09
    4:0.28
    5:0.03
    6:0.05
    7:0.14
    8:0.04
    9:0.05
    10:0.09
    11:0.05
    Negative Logits
    gered
    -1.80
    uncture
    -1.69
    ggles
    -1.67
    ensable
    -1.64
    ggle
    -1.60
    onent
    -1.60
    ankind
    -1.60
    iferation
    -1.59
    functional
    -1.57
    ewitness
    -1.56
    POSITIVE LOGITS
     simplicity
    1.72
     mild
    1.58
     minimalist
    1.55
     scraps
    1.53
    龍�
    1.51
     sunset
    1.51
     underdog
    1.45
     caveat
    1.43
     Mk
    1.40
     quieter
    1.38
    Act Density 0.002%

    No Known Activations