INDEX
Explanations
mediocre
The neuron detects evaluative quality descriptors—especially the word “mediocre” (and other rating adjectives like “extremely bad”).
New Auto-Interp
Negative Logits
hate
-0.06
elease
-0.06
HEET
-0.06
336
-0.06
lance
-0.06
乾
-0.06
=-=-=-=-
-0.06
gst
-0.05
))+
-0.05
られた
-0.05
POSITIVE LOGITS
mediocre
0.08
Instruction
0.08
subjects
0.07
instruction
0.07
국가
0.07
midi
0.07
gaz
0.07
VN
0.07
Capability
0.07
prostoru
0.07
Activations Density 0.001%