INDEX
Explanations
mentions of Valentine's Day and related contexts
New Auto-Interp
Negative Logits
nie
-0.17
Ùij
-0.15
hattan
-0.15
uman
-0.15
lessly
-0.14
živ
-0.14
teenth
-0.14
atÃŃm
-0.14
sed
-0.14
notes
-0.13
POSITIVE LOGITS
ism
0.18
ian
0.17
ornings
0.16
andum
0.16
ized
0.15
ista
0.15
ft
0.15
idad
0.14
izers
0.14
apolis
0.14
Activations Density 0.038%