INDEX
    Explanations

    references to potential dangers or threats, specifically related to traps and explosives

    references to the concept of "boby" and related terms in various contexts

    New Auto-Interp
    Negative Logits
    ioned
    -0.83
    heses
    -0.82
    isco
    -0.76
    ivity
    -0.73
    acular
    -0.69
    oat
    -0.69
    runner
    -0.69
    acial
    -0.68
    erer
    -0.68
    ports
    -0.67
    POSITIVE LOGITS
    pta
    0.92
    ãĥį
    0.91
    ãĥĥãĥĪ
    0.87
    ë
    0.78
    DoS
    0.78
    AGES
    0.77
    æ©Ł
    0.71
    ============
    0.70
    ãĥ³ãĤ¸
    0.70
    ppo
    0.69
    Act Density 0.046%

    No Known Activations