INDEX
    Explanations

    instances of the word "in" across various contexts

    New Auto-Interp
    Negative Logits
    gether
    -0.19
    clusion
    -0.17
     rencont
    -0.16
    .scalablytyped
    -0.16
    /by
    -0.15
    case
    -0.15
    lessness
    -0.15
    care
    -0.14
    STRUCTION
    -0.14
    which
    -0.14
    POSITIVE LOGITS
     danger
    0.23
     fact
    0.21
     direct
    0.21
    extr
    0.20
     flux
    0.20
     itself
    0.19
     essence
    0.19
     Danger
    0.19
    esc
    0.18
     keeping
    0.17
    Act Density 0.107%

    No Known Activations