INDEX
    Explanations

    references to specific names of individuals, particularly the name "Lu," which it activates for

    New Auto-Interp
    Negative Logits
    lu
    -1.16
    luk
    -0.83
    luc
    -0.81
    ThemeOverlay
    -0.75
    lug
    -0.68
    luor
    -0.67
    lup
    -0.65
    -0.61
    lul
    -0.60
    Cita
    -0.60
    POSITIVE LOGITS
     Lu
    2.17
    Lu
    1.77
    expandindo
    0.70
    posedge
    0.63
    ագրություններ
    0.58
    aarrggbb
    0.57
    Ecotoxicity
    0.56
    Зноскі
    0.55
    feira
    0.54
     שוליים
    0.54
    Act Density 0.001%

    No Known Activations