INDEX
    Model
    gemma-2-9b-it
    Layer #
    20
    Steering Hook
    blocks.20.hook_resid_pre
    Steering Strength
    69
    Uploader
    bot-neuronpedia
    Created At
    2/15/2025 1:06:43 AM
    Raw Vector
    Actions
    Explanations

    terms related to potential hazards or negative outcomes

    New Auto-Interp
    Negative Logits
    Personendaten
    -0.69
    styleType
    -0.61
    WriteTagHelper
    -0.59
    :✨
    -0.58
    GOTREF
    -0.57
     Paglinawan
    -0.57
    thentication
    -0.56
     IFTT
    -0.56
    abstractmethod
    -0.55
    NOUNC
    -0.54
    POSITIVE LOGITS
     risk
    0.55
     risks
    0.53
     riesgo
    0.46
    risk
    0.42
     riesgos
    0.42
    Risk
    0.42
     risiko
    0.40
     Risk
    0.39
     Risiken
    0.38
     caution
    0.37
    Act Density 0.000%

    No Known Activations