INDEX
Explanations
references to Valentine's Day and romantic themes
New Auto-Interp
Negative Logits
/Dk
-0.16
/oct
-0.15
eum
-0.15
åĩī
-0.15
moid
-0.15
çĵľ
-0.15
avir
-0.15
tober
-0.14
ÑĢеб
-0.14
ç¥
-0.14
POSITIVE LOGITS
Valentine
0.47
Valent
0.40
romantic
0.38
val
0.36
Val
0.35
romance
0.35
February
0.34
February
0.34
Feb
0.33
cupid
0.32
Activations Density 0.041%