INDEX
    Explanations

    gambling, alcohol, sex, or crime related completions

    New Auto-Interp
    Negative Logits
     is
    0.46
     Weierstrass
    0.41
     underwhelming
    0.40
     kilogram
    0.39
    -{\
    0.39
     cyclohex
    0.39
     ):
    0.38
     uppercase
    0.38
     has
    0.38
     arugula
    0.37
    POSITIVE LOGITS
    0.93
    ”,
    0.80
    "
    0.79
    ’’
    0.78
    0.77
    ”,
    0.75
    ”、
    0.74
    ”،
    0.74
    ”—
    0.72
    rdquo
    0.70
    Act Density 0.293%

    No Known Activations