INDEX
Explanations
words related to cleanliness, organization, or positive attributes
phrases related to cleanliness and orderliness
New Auto-Interp
Negative Logits
CVE
-0.71
asions
-0.68
Downloadha
-0.68
oan
-0.66
GA
-0.66
ioxide
-0.66
7601
-0.66
ativity
-0.66
Defenders
-0.65
lection
-0.64
POSITIVE LOGITS
neat
1.06
ness
0.99
nesses
0.97
tidy
0.91
tid
0.85
icles
0.82
ilde
0.77
liness
0.75
contra
0.72
little
0.71
Activations Density 0.008%