INDEX
    Explanations

    terms related to harmful or dangerous individuals or entities

    mentions of "killer" in various contexts, which likely indicates a focus on terms associated with dangerous entities or situations

    New Auto-Interp
    Negative Logits
    rity
    -0.89
    bles
    -0.80
    urn
    -0.77
    ational
    -0.77
    ional
    -0.77
    edu
    -0.74
    ured
    -0.72
    ratulations
    -0.71
    rir
    -0.70
    urat
    -0.69
    POSITIVE LOGITS
     killer
    1.14
    killer
    1.05
     killers
    0.97
     Killer
    0.91
     spree
    0.84
     whales
    0.82
     whale
    0.76
    knife
    0.75
     beware
    0.72
    fish
    0.71
    Act Density 0.013%

    No Known Activations