INDEX
    Explanations

    references to filenames and file-related terminology

    New Auto-Interp
    Negative Logits
    e
    -0.82
    a
    -0.81
    st
    -0.69
    es
    -0.68
    us
    -0.65
    k
    -0.65
    o
    -0.65
     st
    -0.61
    ل
    -0.58
    i
    -0.58
    POSITIVE LOGITS
    filename
    2.64
     filename
    2.59
    Filename
    2.11
    FILENAME
    1.90
     Filename
    1.81
     filenames
    1.78
    FileName
    1.60
    filenames
    1.49
    文件名
    1.47
     fileName
    1.38
    Act Density 0.065%

    No Known Activations