INDEX
    Explanations

    words related to confidentiality and secrecy

    references to sensitive information or topics

    New Auto-Interp
    Negative Logits
    FIN
    -0.83
    LOAD
    -0.75
     Wolver
    -0.74
    UNCH
    -0.73
    AUT
    -0.72
     Truck
    -0.72
    INST
    -0.70
    aneers
    -0.69
    amaz
    -0.68
    swick
    -0.68
    POSITIVE LOGITS
     sensitive
    1.17
    ivities
    1.07
    mble
    0.96
     sensit
    0.80
    ively
    0.79
     sensitivity
    0.77
    itives
    0.76
    sensitive
    0.76
    ensitive
    0.75
    itiz
    0.75
    Act Density 0.016%

    No Known Activations