INDEX
    Explanations

    pretending or facades

    the neuron activates on words or word pieces that denote pretending or putting on a façade (e.g. “pretense,” “finge,” “façade”).

    New Auto-Interp
    Negative Logits
     στα
    -0.06
    ким
    -0.06
     anv
    -0.06
     numa
    -0.06
    .Res
    -0.06
    .Green
    -0.06
    -END
    -0.05
     utilisateur
    -0.05
     각각
    -0.05
    PageRoute
    -0.05
    POSITIVE LOGITS
    UNC
    0.08
    vers
    0.07
    ocities
    0.07
     superficial
    0.07
    VERS
    0.07
     vitality
    0.07
    mpeg
    0.07
    그래
    0.07
     Coron
    0.07
    game
    0.06
    Act Density 0.029%

    No Known Activations