INDEX
    Explanations

    words related to forceful and impactful actions

    words related to flashy or attention-grabbing actions

    New Auto-Interp
    Negative Logits
    ervation
    -0.80
     Gutenberg
    -0.77
    yden
    -0.76
    otype
    -0.75
    brance
    -0.74
    swer
    -0.74
    asure
    -0.73
    elsen
    -0.71
    ussen
    -0.71
    communication
    -0.71
    POSITIVE LOGITS
    OUT
    0.80
    Hur
    0.77
    IELD
    0.76
    arthy
    0.76
    ASH
    0.75
    hur
    0.74
    eed
    0.73
    Comment
    0.73
    Ùħ
    0.72
    UFF
    0.70
    Act Density 0.020%

    No Known Activations