INDEX
    Explanations

    occurrences of the word "in" and its variants, indicating a focus on spatial or contextual relationships

    New Auto-Interp
    Negative Logits
    abella
    -0.16
    .decor
    -0.14
    onto
    -0.14
    835
    -0.14
    ense
    -0.14
    pered
    -0.13
    berapa
    -0.13
    vault
    -0.13
    XHR
    -0.13
    itm
    -0.13
    POSITIVE LOGITS
     action
    0.23
    -action
    0.17
    auen
    0.17
     drag
    0.17
    má
    0.16
     uniform
    0.16
    Uniform
    0.16
     Uniform
    0.15
    .action
    0.15
    klä
    0.15
    Act Density 0.104%

    No Known Activations