INDEX
Explanations
words associated with happiness or expressions of gladness
New Auto-Interp
Negative Logits
erli
-0.17
er
-0.15
i
-0.15
frau
-0.15
etics
-0.15
ersed
-0.14
lectual
-0.14
ersh
-0.14
eri
-0.14
werk
-0.14
POSITIVE LOGITS
ys
0.24
stone
0.22
ness
0.20
win
0.19
tid
0.18
ewater
0.18
wyn
0.17
tid
0.17
dest
0.17
STONE
0.17
Activations Density 0.005%