INDEX
    Explanations

    words related to existence or presence, particularly in a structured or formal context

    New Auto-Interp
    Negative Logits
    er
    -0.34
    r
    -0.26
    ar
    -0.24
    र
    -0.24
    erse
    -0.22
    rage
    -0.21
    ORE
    -0.20
    اÙĨ
    -0.20
    ract
    -0.20
    rne
    -0.20
    POSITIVE LOGITS
    hetics
    0.27
    hetic
    0.25
    ablish
    0.24
    ech
    0.23
    ev
    0.22
    eh
    0.22
    eb
    0.21
    ead
    0.20
    ee
    0.20
    ewart
    0.20
    Act Density 0.041%

    No Known Activations