INDEX
Explanations
questions related to moral or ethical dilemmas
New Auto-Interp
Negative Logits
displayText
-0.66
Dragonbound
-0.65
ãģ«
-0.65
places
-0.62
isphere
-0.61
ãģĮ
-0.60
stood
-0.58
verage
-0.58
perty
-0.58
legram
-0.58
POSITIVE LOGITS
olation
0.94
olated
0.94
berra
0.86
olate
0.81
anybody
0.72
anyone
0.70
peria
0.70
terness
0.69
ocy
0.69
htar
0.68
Activations Density 1.133%