INDEX
Explanations
phrases that emphasize the concept of morality and ethics
New Auto-Interp
Negative Logits
asio
-0.15
znik
-0.15
amples
-0.14
丸
-0.14
anno
-0.13
Kra
-0.13
889
-0.13
eligible
-0.13
finder
-0.13
uien
-0.13
POSITIVE LOGITS
result
0.33
opposite
0.32
equivalent
0.26
reverse
0.26
product
0.23
same
0.23
stuff
0.22
ologically
0.21
Result
0.20
result
0.20
Activations Density 0.095%