INDEX
    Explanations

    references to individuals and their relationships in various contexts

    New Auto-Interp
    Negative Logits
    ebek
    -0.19
    lub
    -0.17
    ruba
    -0.16
    ikip
    -0.15
    aldi
    -0.15
    juana
    -0.15
    gte
    -0.15
    oldem
    -0.15
    ignal
    -0.15
    igham
    -0.15
    POSITIVE LOGITS
    áž
    0.15
     lessons
    0.15
     gag
    0.14
     throughout
    0.14
    132
    0.14
    ither
    0.14
     Cov
    0.14
     STACK
    0.14
     Throughout
    0.14
    olini
    0.14
    Act Density 0.314%

    No Known Activations