INDEX
    Explanations

    instances of sameness or similarity in concepts or experiences

    New Auto-Interp
    Negative Logits
    rud
    -0.17
     beyond
    -0.16
     itself
    -0.16
    amac
    -0.15
     alone
    -0.15
    alian
    -0.15
     besonders
    -0.15
    ivec
    -0.15
    FormControl
    -0.14
    ntag
    -0.14
    POSITIVE LOGITS
     except
    0.27
    except
    0.23
     than
    0.21
     minus
    0.21
     identical
    0.21
    minus
    0.21
     Except
    0.21
    Except
    0.20
    same
    0.20
     same
    0.19
    Act Density 0.104%

    No Known Activations