INDEX
    Explanations

    words related to significant actions or events

    references to various acts of wrongdoing or violence

    New Auto-Interp
    Negative Logits
     sshd
    -0.79
     Flavoring
    -0.78
     corners
    -0.73
     ceilings
    -0.70
     Challenges
    -0.67
    kees
    -0.65
    ials
    -0.65
    ernels
    -0.65
     strands
    -0.65
     Generations
    -0.64
    POSITIVE LOGITS
     sabotage
    1.00
     kindness
    0.94
     vandalism
    0.87
    EVA
    0.83
     heroism
    0.83
     desperation
    0.80
     aggression
    0.77
     luck
    0.77
     piracy
    0.72
     violence
    0.70
    Act Density 0.048%

    No Known Activations