INDEX
Explanations
words related to creating or updating content
instances of creation and publication of content
New Auto-Interp
Negative Logits
backs
-0.79
bill
-0.76
stood
-0.76
adra
-0.75
raf
-0.72
adish
-0.71
bull
-0.71
BL
-0.70
nder
-0.70
vor
-0.70
POSITIVE LOGITS
Created
1.15
ablishment
0.93
Rating
0.80
CLASSIFIED
0.80
Detected
0.79
Thumbnails
0.77
Created
0.77
Coverage
0.75
Updated
0.75
>[
0.73
Activations Density 0.007%