INDEX
Explanations
mentions of documentation and record-keeping practices
New Auto-Interp
Negative Logits
olig
-0.15
Keyword
-0.15
olean
-0.14
/Gate
-0.14
.BAD
-0.14
Everyday
-0.13
ÏĦε
-0.13
δÎŃ
-0.13
insider
-0.13
Exclusive
-0.13
POSITIVE LOGITS
acity
0.17
-information
0.17
_fields
0.16
fields
0.16
Fields
0.16
/INFO
0.15
.bits
0.15
Fields
0.15
fields
0.14
.template
0.14
Activations Density 0.266%