INDEX
    Explanations

    occurrences of the word "in."

    New Auto-Interp
    Negative Logits
     Coord
    -0.15
    LOPT
    -0.15
    evi
    -0.14
    formation
    -0.14
     fate
    -0.14
     Regards
    -0.14
    ffects
    -0.14
    ç¡
    -0.13
    Ñģли
    -0.13
    illin
    -0.13
    POSITIVE LOGITS
    won
    0.21
    midd
    0.20
    wend
    0.18
    der
    0.18
    enting
    0.17
    zag
    0.17
     het
    0.17
     Europa
    0.16
    .cx
    0.16
    span
    0.15
    Act Density 0.008%

    No Known Activations