INDEX
Explanations
instances of the word "in" across various contexts
New Auto-Interp
Negative Logits
gether
-0.19
clusion
-0.17
rencont
-0.16
.scalablytyped
-0.16
/by
-0.15
case
-0.15
lessness
-0.15
care
-0.14
STRUCTION
-0.14
which
-0.14
POSITIVE LOGITS
danger
0.23
fact
0.21
direct
0.21
extr
0.20
flux
0.20
itself
0.19
essence
0.19
Danger
0.19
esc
0.18
keeping
0.17
Activations Density 0.107%