INDEX
Explanations
instances where the word "hi" is present with a high activation value, potentially indicating a specific focus on this word
repetitions of the phrase "hi."
New Auto-Interp
Negative Logits
Cosponsors
-0.77
Aven
-0.72
Sorceress
-0.68
Izan
-0.65
convol
-0.64
Jenner
-0.64
ilater
-0.62
Euph
-0.62
orative
-0.62
NetMessage
-0.60
POSITIVE LOGITS
hi
1.16
ya
1.07
emen
0.98
hei
0.95
roth
0.93
oga
0.90
emi
0.89
wa
0.89
omo
0.87
oka
0.87
Activations Density 0.006%