INDEX
Model
gemma-2-9b-it
Layer #
20
Steering Hook
blocks.20.hook_resid_pre
Steering Strength
69
Uploader
bot-neuronpedia
Created At
2/15/2025 1:06:43 AM
Raw Vector
Actions
Explanations
terms related to potential hazards or negative outcomes
New Auto-Interp
Negative Logits
Personendaten
-0.69
styleType
-0.61
WriteTagHelper
-0.59
:✨
-0.58
GOTREF
-0.57
Paglinawan
-0.57
thentication
-0.56
IFTT
-0.56
abstractmethod
-0.55
NOUNC
-0.54
POSITIVE LOGITS
risk
0.55
risks
0.53
riesgo
0.46
risk
0.42
riesgos
0.42
Risk
0.42
risiko
0.40
Risk
0.39
Risiken
0.38
caution
0.37
Activations Density 0.000%