INDEX
Explanations
words related to physical or emotional warmth, particularly in the context of clothing or care
New Auto-Interp
Negative Logits
ctor
-0.18
men
-0.18
ne
-0.18
mit
-0.17
ric
-0.17
xt
-0.17
raf
-0.16
rico
-0.16
matic
-0.16
nc
-0.15
POSITIVE LOGITS
Swe
0.23
eter
0.20
stakes
0.20
eters
0.19
eper
0.18
swe
0.18
instein
0.18
itzer
0.18
etch
0.17
.opendaylight
0.17
Activations Density 0.012%