INDEX
    Explanations

    The neuron detects boastful or self-aggrandizing language (words expressing bragging or pride).

    New Auto-Interp
    Negative Logits
    authors
    -0.06
    .path
    -0.06
     futuro
    -0.06
     kişisel
    -0.06
     competing
    -0.06
     bottled
    -0.06
     Από
    -0.06
    params
    -0.06
     hypers
    -0.06
     ayrı
    -0.06
    POSITIVE LOGITS
     Vanguard
    0.07
     Kare
    0.07
    alyze
    0.07
    FixedSize
    0.06
    바이
    0.06
    Senator
    0.06
    avy
    0.06
    分析
    0.06
    .Report
    0.06
     podařilo
    0.06
    Act Density 0.179%

    No Known Activations