INDEX
Explanations
phrases referencing concepts of design, moral philosophy, and hierarchical relationships
New Auto-Interp
Negative Logits
ued
-0.16
amp
-0.16
arium
-0.15
inke
-0.14
zeichnet
-0.14
ude
-0.13
ing
-0.13
pt
-0.13
áo
-0.13
omaly
-0.13
POSITIVE LOGITS
sheer
0.23
matters
0.17
finances
0.16
how
0.16
specifically
0.15
personal
0.15
konkrét
0.15
general
0.14
Sharper
0.14
tual
0.14
Activations Density 0.122%