INDEX
Explanations
references to love and relationships with strong emotional connections
references to love and affection for others
New Auto-Interp
Negative Logits
SPONSORED
-0.89
illin
-0.82
agher
-0.76
arta
-0.76
arat
-0.75
sk
-0.68
util
-0.67
ajo
-0.62
Dispatch
-0.61
inges
-0.61
POSITIVE LOGITS
loved
1.03
dearly
0.90
uncond
0.87
nesday
0.77
76561
0.73
ĸļ
0.72
liked
0.71
Trident
0.70
loving
0.69
itely
0.69
Activations Density 0.012%