INDEX
Explanations
expressions of societal observations and moral dilemmas
New Auto-Interp
Negative Logits
unicorn
-0.15
Simpsons
-0.14
PIO
-0.14
abei
-0.14
ernes
-0.14
åijĺ
-0.14
extra
-0.13
reetings
-0.13
astronaut
-0.13
æĪ¶
-0.13
POSITIVE LOGITS
circumstance
0.24
Nature
0.22
gravity
0.21
nature
0.21
fate
0.19
History
0.19
Nature
0.19
Those
0.19
gravity
0.19
history
0.19
Activations Density 0.127%