INDEX
Explanations
discussions about moral and ethical dilemmas related to personal beliefs and practices
New Auto-Interp
Negative Logits
raiſ
-0.81
itſelf
-0.78
ſever
-0.78
ſta
-0.78
purpoſe
-0.77
iſt
-0.76
pleaſure
-0.76
myſelf
-0.75
deſt
-0.75
houſe
-0.74
POSITIVE LOGITS
mr
0.65
q
0.64
south
0.61
k
0.59
north
0.59
ko
0.54
zu
0.53
m
0.53
y
0.53
j
0.52
Activations Density 2.194%