INDEX
Explanations
words related to restrictions, regulations, and prohibited content
themes related to restrictions and regulations regarding access to various types of content or materials
New Auto-Interp
Negative Logits
Skies
-0.79
guts
-0.72
jaws
-0.71
Cars
-0.70
Weeks
-0.70
Dogs
-0.70
Balls
-0.68
Eyes
-0.68
Tycoon
-0.68
Dreams
-0.67
POSITIVE LOGITS
unrelated
1.01
unregulated
1.01
nont
1.00
inappropriate
0.99
undesirable
0.99
inaccessible
0.99
non
0.99
identifiable
0.98
questionable
0.97
spurious
0.97
Activations Density 0.164%