INDEX
Explanations
the neuron activates on numeric literals (especially floating‐point numbers) in the text.
New Auto-Interp
Negative Logits
framing
-0.06
Vale
-0.06
रत
-0.06
enheim
-0.06
ivial
-0.06
Les
-0.06
ammers
-0.06
Surprise
-0.06
ligne
-0.06
.layers
-0.06
POSITIVE LOGITS
},"
0.07
้ม
0.06
@endsection
0.06
/>';↵
0.06
_al
0.06
dealing
0.06
.hasClass
0.06
shl
0.06
.virtual
0.06
?";↵
0.06
Activations Density 0.001%