INDEX
    Explanations

    words related to confusion or complexity

    New Auto-Interp
    Negative Logits
    le
    -0.66
    lej
    -0.32
    lea
    -0.28
    Leod
    -0.27
    lek
    -0.27
    leitung
    -0.24
    er
    -0.22
    lein
    -0.22
    lei
    -0.22
    leo
    -0.19
    POSITIVE LOGITS
    lesh
    0.31
    led
    0.27
    LES
    0.27
    legate
    0.26
    lescope
    0.25
    les
    0.25
    ling
    0.25
    ler
    0.25
    lename
    0.24
    leground
    0.24
    Act Density 0.059%

    No Known Activations