INDEX
Explanations
phrases related to discussions of ethical and social issues
New Auto-Interp
Negative Logits
angu
-0.07
ransition
-0.07
mark
-0.07
inkel
-0.06
izo
-0.06
lings
-0.06
pty
-0.06
iyim
-0.06
odynam
-0.06
alla
-0.06
POSITIVE LOGITS
âĢĮâĢĮ
0.07
actively
0.07
yped
0.06
aben
0.06
root
0.06
unfold
0.06
.pb
0.06
Benn
0.06
Ùĥس
0.06
عات
0.06
Activations Density 0.061%