INDEX
Explanations
references to channels or systems of organization
New Auto-Interp
Negative Logits
-0.60
'
-0.55
’
-0.54
-0.52
…
-0.49
-0.48
tak
-0.48
!
-0.47
[
-0.47
...
-0.47
POSITIVE LOGITS
."));
1.07
^(@)
1.01
Efq
0.96
Theſe
0.93
)");
0.91
Monfieur
0.89
$_"
0.89
channels
0.88
―――――
0.85
myſelf
0.84
Activations Density 0.429%