INDEX
    Explanations

    words related to exploration and discovery

    New Auto-Interp
    Negative Logits
    uracy
    -0.16
    ména
    -0.15
    ilia
    -0.15
     Unidos
    -0.15
    η
    -0.15
    ural
    -0.15
    igi
    -0.15
    hee
    -0.14
    ched
    -0.14
    endar
    -0.14
    POSITIVE LOGITS
     ways
    0.16
    horn
    0.15
    lust
    0.15
    ments
    0.15
    es
    0.14
    948
    0.14
    غة
    0.14
    ives
    0.14
    ry
    0.14
    67
    0.14
    Act Density 0.031%

    No Known Activations