INDEX
    Explanations

    punctuation

    This neuron detects words and phrases used for disclaiming, denying, or clarifying (e.g., “no,” “denied,” “did nothing,” “clarifies”).

    New Auto-Interp
    Negative Logits
    З
    -0.07
    legs
    -0.07
    break
    -0.07
    wd
    -0.07
    cookie
    -0.06
    Effects
    -0.06
     Hud
    -0.06
    89
    -0.06
    password
    -0.06
    RG
    -0.06
    POSITIVE LOGITS
    _IOC
    0.07
     θέση
    0.07
    0.07
    .initializeApp
    0.07
     ballo
    0.06
     raz
    0.06
     aVar
    0.06
     zač
    0.06
     ){
    ↵
    0.06
    ?>><?
    0.06
    Act Density 0.041%

    No Known Activations