INDEX
Explanations
specifying details or categories
New Auto-Interp
Negative Logits
.
0.82
to
0.79
In
0.77
When
0.74
An
0.74
(
0.73
for
0.70
to
0.69
or
0.69
To
0.69
POSITIVE LOGITS
classics
0.89
pesticides
0.88
antiques
0.87
explosives
0.85
<unused2142>
0.84
testAvg
0.83
nightclubs
0.83
ceramics
0.82
<unused2064>
0.82
<unused1097>
0.81
Activations Density 4.188%