INDEX
Explanations
caught, discovered
The neuron detects advice or instructions focused on avoiding detection or being caught after committing wrongdoing.
New Auto-Interp
Negative Logits
-block
-0.08
Trong
-0.07
стен
-0.07
Su
-0.06
mundane
-0.06
Sterling
-0.06
문의
-0.06
chaotic
-0.06
Pru
-0.06
*((
-0.06
POSITIVE LOGITS
_NAME
0.06
uggestions
0.06
lest
0.06
UTIL
0.06
振り
0.06
empt
0.06
$title
0.05
ϊ
0.05
VAL
0.05
reperc
0.05
Activations Density 0.009%