INDEX
Explanations
occurrences of the word "love" in various contexts
New Auto-Interp
Negative Logits
lage
-0.15
oot
-0.15
unner
-0.15
l
-0.15
.scalablytyped
-0.14
ootball
-0.14
λαν
-0.14
ural
-0.14
osy
-0.13
um
-0.13
POSITIVE LOGITS
affair
0.15
fully
0.15
-kind
0.14
amentals
0.14
enci
0.14
full
0.14
arms
0.14
joy
0.13
ably
0.13
be
0.13
Activations Density 0.046%