INDEX
Explanations
images or image-related HTML code
occurrences of image-related content
New Auto-Interp
Negative Logits
termination
-0.80
termin
-0.78
parity
-0.70
choke
-0.68
itive
-0.66
ACTED
-0.64
Sawyer
-0.63
throat
-0.63
paraly
-0.63
backdoor
-0.63
POSITIVE LOGITS
img
3.74
images
2.64
image
1.79
cdn
1.77
uploads
1.62
img
1.57
static
1.40
boards
1.38
files
1.29
forums
1.25
Activations Density 0.029%