INDEX
Explanations
occurrences of the word "Hello"
the occurrence of the phrase "Hello" in various contexts
New Auto-Interp
Negative Logits
arian
-0.87
aic
-0.81
hip
-0.79
ifiable
-0.79
eele
-0.78
pite
-0.78
uing
-0.78
arians
-0.77
nutrition
-0.75
rovers
-0.74
POSITIVE LOGITS
Kitty
1.01
hello
0.84
Neighbor
0.83
!.
0.78
Hello
0.73
!,
0.73
Goodbye
0.71
WORLD
0.70
!".
0.70
ãĥ¼ãĥ«
0.69
Activations Density 0.015%