INDEX
Explanations
instances of the word "love" and its variations in various contexts
New Auto-Interp
Negative Logits
es
-0.23
ez
-0.21
ej
-0.20
ek
-0.19
eo
-0.18
esin
-0.18
esine
-0.17
eses
-0.17
ele
-0.16
eki
-0.16
POSITIVE LOGITS
ewise
0.23
emaker
0.21
etime
0.21
ethe
0.21
ings
0.20
eman
0.19
ETIME
0.19
INGS
0.19
eworthy
0.18
ewis
0.18
Activations Density 0.110%