INDEX
Explanations
phrases related to labeling and classification
references to labels and labeling practices
New Auto-Interp
Negative Logits
tein
-0.70
issance
-0.67
Globe
-0.66
Yin
-0.64
vati
-0.64
ashington
-0.64
ctica
-0.63
gm
-0.63
olars
-0.62
hire
-0.62
POSITIVE LOGITS
label
0.90
mates
0.89
labels
0.87
mate
0.85
cloth
0.80
strip
0.76
label
0.76
Label
0.75
red
0.75
mark
0.74
Activations Density 0.022%