INDEX
    Explanations

    the presence of the word "La" in various contexts

    New Auto-Interp
    Negative Logits
    auc
    -0.19
    rest
    -0.15
    phan
    -0.14
    h
    -0.14
     Maxim
    -0.14
    ês
    -0.14
    sea
    -0.14
     bystand
    -0.14
    attle
    -0.14
    rias
    -0.14
    POSITIVE LOGITS
    unched
    0.26
    uren
    0.23
    ikip
    0.20
    undry
    0.20
    uded
    0.19
    urence
    0.19
    zyst
    0.19
    mgr
    0.18
    uder
    0.18
    oshi
    0.17
    Act Density 0.021%

    No Known Activations