INDEX
Explanations
terms related to classification in various contexts
New Auto-Interp
Negative Logits
<unused74>
-1.02
<unused52>
-1.02
<unused41>
-1.02
<unused51>
-1.02
<unused14>
-1.02
<unused3>
-1.02
<unused16>
-1.02
<unused23>
-1.02
ementara
-1.02
<pad>
-1.02
POSITIVE LOGITS
classification
0.83
Classification
0.60
classification
0.58
Classification
0.51
form
0.51
div
0.50
↵
0.48
0.47
liber
0.47
<eos>
0.47
Activations Density 0.233%