INDEX
    Explanations

    names of individuals and their relationships or associations with different contexts

    New Auto-Interp
    Negative Logits
     d
    -0.49
     D
    -0.44
     د
    -0.41
     ד
    -0.39
     দ
    -0.37
     Д
    -0.36
     da
    -0.36
     DA
    -0.35
     Da
    -0.34
     ड
    -0.34
    POSITIVE LOGITS
    dan
    1.46
    don
    1.38
    ders
    1.28
    dog
    1.26
    dr
    1.24
    din
    1.24
    dam
    1.23
    done
    1.23
    dor
    1.22
    down
    1.18
    Act Density 0.297%

    No Known Activations