INDEX
Explanations
greetings or introductory phrases
occurrences of the phrase "Hello."
New Auto-Interp
Negative Logits
aic
-0.91
ucket
-0.84
rent
-0.83
eele
-0.76
ifiable
-0.75
hip
-0.74
arian
-0.71
arians
-0.67
nutrition
-0.66
abase
-0.66
POSITIVE LOGITS
Kitty
1.23
Neighbor
0.96
Goodbye
0.95
hello
0.85
Bye
0.82
bye
0.82
!.
0.82
Hello
0.80
!
0.77
Again
0.75
Activations Density 0.021%