INDEX
    Explanations

    Self-reference

    This neuron detects first-person self-referential words and role/identity declarations (tokens like "I", "I'm", "am" and similar self-identifying phrases).

    New Auto-Interp
    Negative Logits
    IVING
    -0.08
    ario
    -0.08
    /list
    -0.07
    OVE
    -0.07
    IRE
    -0.07
    ITA
    -0.07
    _sphere
    -0.07
    ijo
    -0.07
    ARIO
    -0.07
    .Hidden
    -0.06
    POSITIVE LOGITS
     licz
    0.07
     FIFA
    0.06
    ];↵↵↵
    0.06
    να
    0.06
    .getApp
    0.06
     neby
    0.06
     essa
    0.06
     porte
    0.06
    'nde
    0.05
     pochop
    0.05
    Act Density 0.175%

    No Known Activations