INDEX
Explanations
inappropriate language or profanity
instances of profanity and vulgar language
New Auto-Interp
Negative Logits
Specific
-0.79
ItemImage
-0.78
wcs
-0.72
è£ıè
-0.70
inventoryQuantity
-0.70
condem
-0.69
åĬ
-0.68
ItemThumbnailImage
-0.68
narrowing
-0.67
complication
-0.67
POSITIVE LOGITS
ruary
0.96
gerald
0.82
gling
0.80
ratulations
0.77
illet
0.77
ibly
0.75
laugh
0.72
sth
0.72
him
0.72
estro
0.72
Activations Density 0.039%