INDEX
Explanations
discussions around moral and ethical dilemmas, particularly those involving rights and violence
New Auto-Interp
Negative Logits
andon
-0.17
acÃŃ
-0.17
abl
-0.15
lider
-0.14
ledge
-0.14
opsis
-0.14
abl
-0.14
dater
-0.14
ambre
-0.14
esper
-0.14
POSITIVE LOGITS
advance
0.40
serve
0.40
advancing
0.37
serves
0.36
further
0.36
serving
0.35
advance
0.35
served
0.34
serve
0.34
Serve
0.34
Activations Density 0.355%