INDEX
Explanations
the word "through" in various contexts
New Auto-Interp
Negative Logits
ised
-0.16
zed
-0.14
iments
-0.14
aken
-0.14
μην
-0.14
suppress
-0.14
andes
-0.13
oki
-0.13
akens
-0.13
æį®
-0.13
POSITIVE LOGITS
ought
0.35
puts
0.34
-out
0.31
OUT
0.30
ly
0.30
put
0.28
out
0.28
ogh
0.28
ough
0.27
/by
0.27
Activations Density 0.089%