INDEX
Explanations
a specific word with the letters 'qu' followed by a high activation value
the word "qu" in various contexts
New Auto-Interp
Negative Logits
Norn
-0.68
Heidi
-0.66
ASED
-0.64
Dayton
-0.64
foreskin
-0.63
handlers
-0.63
ATURE
-0.61
SEA
-0.61
Bakr
-0.60
Aus
-0.59
POSITIVE LOGITS
qu
4.43
QU
2.32
quer
2.29
qua
2.14
quin
2.11
quit
2.11
ques
2.00
Qu
1.90
quet
1.82
que
1.80
Activations Density 0.021%