INDEX
Explanations
favorable
This neuron selectively activates on the adjective “favorable.”
New Auto-Interp
Negative Logits
metres
-0.08
Archives
-0.07
scream
-0.07
secret
-0.07
corpses
-0.07
.Substring
-0.07
ropoda
-0.06
Screens
-0.06
loops
-0.06
inserts
-0.06
POSITIVE LOGITS
unfavorable
0.10
favorable
0.08
favourable
0.08
unfavor
0.07
ULA
0.07
favor
0.07
TensorFlow
0.07
благодаря
0.06
ToUpper
0.06
flattering
0.06
Activations Density 0.004%