INDEX
    Explanations

    mentions of legal or controversial matters

    punctuation and specific numerical contexts

    New Auto-Interp
    Negative Logits
    ensity
    -0.69
    iphate
    -0.69
    ouble
    -0.67
    mble
    -0.67
    Explore
    -0.67
    "}],"
    -0.66
    rossover
    -0.66
    rawl
    -0.65
    icult
    -0.65
    omever
    -0.64
    POSITIVE LOGITS
     prompting
    1.30
     thereby
    1.29
     causing
    1.20
     triggering
    1.16
     resulting
    1.15
     sparking
    1.13
     forcing
    1.12
     thus
    1.06
     ruining
    1.05
     depri
    1.05
    Act Density 0.369%

    No Known Activations