INDEX
Explanations
variations of the word "this."
proper nouns starting with Thi
New Auto-Interp
Negative Logits
-0.62
Kog
-0.58
Inflater
-0.57
hrens
-0.56
crats
-0.55
RER
-0.55
ruz
-0.55
ſelf
-0.54
featureID
-0.53
Opus
-0.52
POSITIVE LOGITS
Thi
2.27
Thi
2.19
thi
1.97
THI
1.38
thi
1.36
Thiago
1.18
Thiel
0.99
thia
0.96
thie
0.90
Thiru
0.85
Activations Density 0.005%