INDEX
    Explanations

    words related to action or behavior

    words that indicate judgment or decision-making processes

    New Auto-Interp
    Negative Logits
     includ
    -0.76
     poem
    -0.68
     suffix
    -0.67
    feat
    -0.65
    RB
    -0.65
     tune
    -0.64
    fit
    -0.61
    mat
    -0.59
     vis
    -0.58
     liberate
    -0.57
    POSITIVE LOGITS
    nces
    0.86
    ered
    0.85
    igion
    0.81
    ased
    0.79
    ragon
    0.77
    rals
    0.75
    wolves
    0.75
    oll
    0.74
    uled
    0.73
    emption
    0.73
    Act Density 0.009%

    No Known Activations