INDEX
Explanations
endearments or terms of affection
terms of endearment and affectionate language
New Auto-Interp
Negative Logits
ammers
-0.84
hner
-0.80
Cheong
-0.79
DoS
-0.76
ioch
-0.76
TPPStreamerBot
-0.74
IDER
-0.73
RAFT
-0.71
oker
-0.71
vernment
-0.70
POSITIVE LOGITS
dear
1.12
dearly
0.89
departed
0.87
friend
0.81
old
0.78
hearts
0.77
admiration
0.75
friends
0.73
memories
0.73
uncle
0.72
Activations Density 0.019%