INDEX
Explanations
references to nature or natural themes
New Auto-Interp
Negative Logits
oup
-0.18
hyper
-0.16
upe
-0.15
elts
-0.15
iw
-0.15
otti
-0.14
prech
-0.14
elt
-0.14
aire
-0.14
ics
-0.13
POSITIVE LOGITS
URAL
0.26
aniel
0.25
asha
0.23
Nat
0.22
ürlich
0.22
nat
0.21
ural
0.19
anson
0.19
Nat
0.19
sume
0.19
Activations Density 0.010%