INDEX
Explanations
references to darkness or dark themes
New Auto-Interp
Negative Logits
é¦Ļ
-0.18
ighton
-0.16
ect
-0.16
plural
-0.15
klar
-0.15
@show
-0.15
ekk
-0.14
86
-0.14
rab
-0.14
846
-0.14
POSITIVE LOGITS
ened
0.31
ening
0.27
-dark
0.23
smith
0.20
enment
0.17
ed
0.17
dark
0.17
/light
0.16
ly
0.16
itecture
0.15
Activations Density 0.033%