INDEX
    Explanations

    references to the concept of knowledge or recognition

    New Auto-Interp
    Negative Logits
    rana
    -0.17
    fts
    -0.16
    ares
    -0.15
    uros
    -0.14
    Stub
    -0.14
    nova
    -0.13
    ви
    -0.13
    itler
    -0.13
     ä½į
    -0.13
    orgh
    -0.13
    POSITIVE LOGITS
     simply
    0.29
     popular
    0.24
     Simply
    0.23
    Simply
    0.23
    popular
    0.20
    s
    0.20
     simplement
    0.20
     familiar
    0.19
    col
    0.19
     inform
    0.19
    Act Density 0.024%

    No Known Activations