INDEX
    Explanations

    This neuron detects occurrences of the word “role.”

    New Auto-Interp
    Negative Logits
    weet
    -0.08
     cleaning
    -0.08
     KING
    -0.07
     Davis
    -0.07
     inspection
    -0.07
     Mint
    -0.06
    gorithm
    -0.06
    Math
    -0.06
     Smith
    -0.06
     Bay
    -0.06
    POSITIVE LOGITS
     Role
    0.14
     role
    0.14
    Role
    0.13
    role
    0.12
    -role
    0.12
     roles
    0.12
     ROLE
    0.10
     Roles
    0.10
     roleName
    0.10
    .role
    0.09
    Act Density 0.038%

    No Known Activations