INDEX
Explanations
phrases related to well-wishing and greetings
mentions of significant events or emotional responses in narratives
New Auto-Interp
Negative Logits
ÅŁ
-0.90
ÄŁ
-0.88
qqa
-0.88
âĶ
-0.86
Erd
-0.86
udeau
-0.86
Raqqa
-0.85
NAT
-0.82
igrant
-0.80
rapists
-0.79
POSITIVE LOGITS
Jerry
2.20
Jerry
2.07
Garcia
1.58
Bobby
1.56
Billy
1.50
Grateful
1.43
Billy
1.43
Jimmy
1.42
Ronnie
1.36
Johnny
1.34
Activations Density 0.186%