INDEX
Explanations
dark themes like loss
contractions and possessives marked by apostrophes.
The neuron is picking out slot‐machine jargon and feature headings (e.g. “slots,” “RTP,” “volatility,” “max win,” “bonus features,” etc.).
New Auto-Interp
Negative Logits
an
0.79
a
0.66
er
0.64
ar
0.63
z
0.62
u
0.61
на
0.59
in
0.57
↵
0.57
ed
0.57
POSITIVE LOGITS
0.69
is
0.59
are
0.59
in
0.53
an
0.50
to
0.49
a
0.49
è
0.48
as
0.47
،
0.47
Activations Density 15.937%