INDEX
    Explanations

    instances of the word "another."

    New Auto-Interp
    Negative Logits
     other
    -0.18
     further
    -0.16
    åı¦
    -0.16
     autres
    -0.16
    ups
    -0.16
    ses
    -0.15
    _OTHER
    -0.14
     outras
    -0.14
     Other
    -0.14
     andre
    -0.14
    POSITIVE LOGITS
    -than
    0.23
     equally
    0.20
    world
    0.17
    ness
    0.17
    ¢åįķ
    0.17
    ovnÄĽ
    0.17
    ildo
    0.16
    layer
    0.15
     dozen
    0.15
     layer
    0.15
    Act Density 0.042%

    No Known Activations