INDEX
    Explanations

    problems or flaws

    The neuron activates on critical and evaluative language highlighting weak or flawed evidence (e.g. terms like “rejecting,” “scientific evidence,” “negative consequences,” “harm,” “reasons why,” etc.).

    New Auto-Interp
    Negative Logits
    .getMethod
    -0.07
    CallableWrapper
    -0.06
    .cornerRadius
    -0.06
    Usuario
    -0.06
    belongsTo
    -0.06
    _hidden
    -0.06
     onPause
    -0.06
     ق
    -0.06
     tục
    -0.06
    .addTab
    -0.06
    POSITIVE LOGITS
    ϊ
    0.07
     vítěz
    0.07
    QUENCY
    0.07
     Chance
    0.07
    \">"
    0.07
    eways
    0.06
    RAFT
    0.06
    (fs
    0.06
     outsiders
    0.06
    <Real
    0.06
    Act Density 0.060%

    No Known Activations