INDEX
Explanations
discussions about moral and ethical dilemmas in society
New Auto-Interp
Negative Logits
rather
-0.25
vier
-0.16
somewhat
-0.15
sogar
-0.15
aze
-0.15
rather
-0.15
plutôt
-0.15
repet
-0.14
repetitive
-0.14
åıĪ
-0.14
POSITIVE LOGITS
suddenly
0.23
nor
0.19
sudden
0.18
necessarily
0.17
nor
0.16
randomly
0.15
llx
0.15
rocket
0.15
overnight
0.15
Suddenly
0.15
Activations Density 0.302%