INDEX
Explanations
dark-related imagery and contexts
references to dark imagery and themes
New Auto-Interp
Negative Logits
utable
-0.80
raltar
-0.80
kson
-0.76
llah
-0.76
ufact
-0.75
agine
-0.73
essors
-0.72
oples
-0.72
onent
-0.72
yip
-0.70
POSITIVE LOGITS
ening
1.23
ened
1.08
recess
0.92
horse
0.89
moon
0.88
clouds
0.87
gray
0.85
dark
0.85
brown
0.83
grey
0.83
Activations Density 0.030%