INDEX
    Explanations

    comparisons and contrasts between different regions, historical events, or societal issues

    New Auto-Interp
    Negative Logits
     Bylo
    -0.17
    urum
    -0.16
    Spec
    -0.15
    bert
    -0.14
    alon
    -0.14
    apesh
    -0.14
    ogl
    -0.14
    zs
    -0.14
    ogue
    -0.14
    ÙĤب
    -0.14
    POSITIVE LOGITS
     similar
    0.21
     experience
    0.19
     comparable
    0.18
    similar
    0.17
    imilar
    0.16
     experi
    0.16
    experience
    0.16
     past
    0.16
    rollo
    0.16
     experiencia
    0.16
    Act Density 0.305%

    No Known Activations