INDEX
Explanations
words related to lightness or light-heartedness
New Auto-Interp
Negative Logits
Ŀ
-0.17
ean
-0.17
iston
-0.16
him
-0.16
icht
-0.15
que
-0.15
uel
-0.15
lep
-0.14
738
-0.14
anges
-0.14
POSITIVE LOGITS
ening
0.34
ning
0.31
nings
0.31
ened
0.30
-weight
0.28
enment
0.28
weights
0.28
-duty
0.27
NING
0.27
bul
0.27
Activations Density 0.021%