INDEX
Explanations
risks versus rewards
The neuron flags terms that quantify trade-offs or improvements—especially words denoting gains, benefits, or efficiency increases.
New Auto-Interp
Negative Logits
.gg
-0.06
公共
-0.06
Game
-0.06
θρώ
-0.06
langs
-0.06
TestCase
-0.06
predominant
-0.06
Conversely
-0.06
precondition
-0.06
ać
-0.06
POSITIVE LOGITS
modified
0.06
Fant
0.06
Keeping
0.06
mins
0.06
постоянно
0.06
excessive
0.06
ecstatic
0.06
fractional
0.06
_WARN
0.06
.boost
0.06
Activations Density 0.065%