INDEX
    Explanations

    pairs, opposites, or divisions

    New Auto-Interp
    Negative Logits
    c
    0.64
    s
    0.59
    t
    0.52
    س
    0.50
    d
    0.50
    с
    0.49
    e
    0.49
    δε
    0.48
    giving
    0.47
     offices
    0.45
    POSITIVE LOGITS
     tekint
    0.49
     kraja
    0.48
     رکھتا
    0.47
     lógico
    0.47
     Niveau
    0.47
     vibhav
    0.47
     zašt
    0.47
     tarko
    0.45
     duidelijk
    0.45
     chiaro
    0.45
    Act Density 0.001%

    No Known Activations