INDEX
Explanations
references to various layers in a context, likely related to structure or hierarchy
layer and layers
New Auto-Interp
Negative Logits
Goodwin
-0.53
Com
-0.50
OnInit
-0.49
habitude
-0.47
Credit
-0.45
Phelps
-0.45
GOOD
-0.44
ußt
-0.44
com
-0.44
przys
-0.43
POSITIVE LOGITS
Layer
1.37
layer
1.30
layer
1.24
Layer
1.23
LAYER
1.22
Layers
1.14
layers
1.06
layers
1.00
Layers
0.99
LAYER
0.95
Activations Density 0.019%