INDEX
Explanations
words related to health and a variety of effects or conditions
New Auto-Interp
Negative Logits
-runner
-0.18
lying
-0.15
letcher
-0.14
öh
-0.14
ancell
-0.14
üssen
-0.14
corros
-0.14
jeme
-0.14
nackte
-0.14
loon
-0.14
POSITIVE LOGITS
ellschaft
0.27
Ges
0.19
ellig
0.18
Ellen
0.17
und
0.17
ichte
0.17
ells
0.17
ell
0.17
ocks
0.15
rack
0.15
Activations Density 0.010%