INDEX
    Explanations

    words related to power, strength, and performance

    terms related to damage, power, performance, and other metrics of effectiveness in various contexts

    New Auto-Interp
    Negative Logits
    creen
    -0.75
    ovie
    -0.69
    uthor
    -0.66
     Tale
    -0.65
     Rue
    -0.65
    igl
    -0.62
     Friendship
    -0.62
     Vote
    -0.61
     Beck
    -0.60
    ournal
    -0.59
    POSITIVE LOGITS
     compared
    0.87
     capability
    0.81
    iencies
    0.80
     efficiency
    0.79
     output
    0.77
     advantages
    0.77
     capabilities
    0.77
     advantage
    0.75
     requirements
    0.74
     destro
    0.74
    Act Density 0.241%

    No Known Activations