INDEX
Explanations
phrases that challenge societal norms or highlight moral dilemmas
New Auto-Interp
Negative Logits
oldt
-0.17
yer
-0.15
croft
-0.15
Ut
-0.15
isted
-0.14
chers
-0.14
ìĤ¬ëĬĶ
-0.14
ãĥ³ãĥķ
-0.14
borough
-0.13
.googleapis
-0.13
POSITIVE LOGITS
diseñador
0.18
oon
0.14
ily
0.14
occo
0.14
iew
0.14
é¤
0.14
itar
0.14
Bonnie
0.14
nder
0.14
NÄĽm
0.13
Activations Density 0.144%