INDEX
Explanations
variations of the word "in" across different contexts
New Auto-Interp
Negative Logits
adiens
-0.15
.Experimental
-0.14
205
-0.14
licher
-0.14
Hath
-0.13
quette
-0.13
åĸ
-0.13
agini
-0.13
rones
-0.13
важ
-0.13
POSITIVE LOGITS
premises
0.15
-the
0.15
premise
0.15
-
0.14
-pro
0.14
oven
0.14
-in
0.14
Zum
0.14
izi
0.14
situ
0.13
Activations Density 0.076%