INDEX
Explanations
instances of the term "label" within the text
New Auto-Interp
Negative Logits
+#+
-0.59
baby
-0.54
myſelf
-0.53
passwords
-0.53
ghijklmnop
-0.53
iſten
-0.52
pleaſure
-0.52
cryst
-0.52
credit
-0.52
fluid
-0.51
POSITIVE LOGITS
label
1.08
label
0.95
labels
0.80
Label
0.80
labels
0.70
Label
0.70
LABEL
0.69
Labels
0.68
etiqueta
0.67
LABEL
0.63
Activations Density 0.236%