INDEX
Explanations
the letter "G" in various contexts
New Auto-Interp
Negative Logits
ĸļ
-0.78
Mellon
-0.62
compr
-0.61
derail
-0.61
parted
-0.59
zo
-0.59
bottleneck
-0.59
htt
-0.59
reprodu
-0.58
puter
-0.58
POSITIVE LOGITS
roups
1.34
raphic
1.29
reetings
1.23
ossip
1.20
reens
1.19
ourmet
1.18
irlfriend
1.15
iants
1.14
athering
1.11
rowth
1.11
Activations Density 0.033%