INDEX
Explanations
greetings
instances of the greeting "Hello"
New Auto-Interp
Negative Logits
partisans
-0.78
utton
-0.77
reserv
-0.71
udic
-0.70
adjud
-0.65
grounds
-0.65
agency
-0.65
Dug
-0.65
exped
-0.65
dispos
-0.65
POSITIVE LOGITS
Hello
3.58
Hello
3.45
hello
2.81
hello
2.81
Hi
1.96
Hi
1.73
Goodbye
1.71
reetings
1.68
greeting
1.45
Dear
1.41
Activations Density 0.018%