INDEX
Explanations
the concept of "nature" in various contexts or domains
New Auto-Interp
Negative Logits
icle
-0.17
ery
-0.17
baz
-0.16
adian
-0.16
runner
-0.16
ulu
-0.15
imiz
-0.15
oup
-0.15
nga
-0.15
rael
-0.15
POSITIVE LOGITS
lle
0.24
istically
0.19
aleza
0.18
istic
0.18
/ag
0.17
erre
0.17
zÄĻ
0.17
fully
0.16
sted
0.16
áº
0.16
Activations Density 0.019%