INDEX
    Explanations

    the word "mask" with high activations, and related words like "disguise" with lower activations

    references to masks and disguises

    New Auto-Interp
    Negative Logits
    athan
    -0.74
    âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
    -0.72
    scill
    -0.72
    course
    -0.70
    GGGGGGGG
    -0.68
     Yards
    -0.66
    lished
    -0.66
    ALLY
    -0.66
    ally
    -0.66
    rian
    -0.65
    POSITIVE LOGITS
     masks
    1.12
     mask
    1.06
    resses
    0.91
     Mask
    0.88
    mask
    0.83
     wearer
    0.80
    Mask
    0.80
     worn
    0.80
     concealed
    0.79
    ħĭ
    0.77
    Act Density 0.017%

    No Known Activations