INDEX
Explanations
words related to drug use or references
terms related to self-inflicted harm or injury
New Auto-Interp
Negative Logits
dues
-0.74
fares
-0.69
grades
-0.69
Rover
-0.69
ARY
-0.65
standalone
-0.64
semester
-0.64
stag
-0.64
resumes
-0.64
Stand
-0.62
POSITIVE LOGITS
inf
4.17
Inf
2.26
Inf
1.65
inf
1.39
inst
1.16
infect
1.14
inc
1.12
inter
1.04
INF
1.01
imm
1.00
Activations Density 0.011%