INDEX
    Explanations

    words related to drug use or references

    terms related to self-inflicted harm or injury

    New Auto-Interp
    Negative Logits
     dues
    -0.74
     fares
    -0.69
     grades
    -0.69
     Rover
    -0.69
    ARY
    -0.65
     standalone
    -0.64
     semester
    -0.64
     stag
    -0.64
     resumes
    -0.64
     Stand
    -0.62
    POSITIVE LOGITS
    inf
    4.17
    Inf
    2.26
     Inf
    1.65
     inf
    1.39
    inst
    1.16
    infect
    1.14
    inc
    1.12
    inter
    1.04
     INF
    1.01
    imm
    1.00
    Act Density 0.011%

    No Known Activations