INDEX
    Explanations

    words associated with connections and relationships

    New Auto-Interp
    Negative Logits
    umar
    -0.17
    uchen
    -0.15
    fff
    -0.15
     Pru
    -0.15
    shall
    -0.15
    oze
    -0.15
    hen
    -0.15
     ace
    -0.14
    Alternate
    -0.14
    hi
    -0.14
    POSITIVE LOGITS
    ég
    0.22
    zt
    0.21
    ág
    0.20
    zo
    0.19
    zer
    0.18
    ietet
    0.18
    rung
    0.17
    zen
    0.17
    zc
    0.17
    ereg
    0.15
    Act Density 0.001%

    No Known Activations