INDEX
    Explanations

    Heavily focuses on detecting mentions of poisonous substances

    references to various types of poison

    New Auto-Interp
    Negative Logits
    noon
    -0.83
    dar
    -0.72
    pora
    -0.67
    dimension
    -0.66
    blance
    -0.66
    irc
    -0.66
    Raid
    -0.64
    aan
    -0.64
    stand
    -0.63
     Scouting
    -0.63
    POSITIVE LOGITS
     poisoning
    1.11
     poison
    1.01
     poisoned
    0.97
     poisonous
    0.93
     dart
    0.93
     gas
    0.87
     arsenic
    0.84
    ously
    0.84
     darts
    0.83
     Ivy
    0.82
    Act Density 0.012%

    No Known Activations