INDEX
Explanations
references to toxicity in various contexts
types of toxicity
New Auto-Interp
Negative Logits
increí
-0.59
idoo
-0.58
incrí
-0.56
-0.56
stiefe
-0.55
liesslich
-0.55
IndentedString
-0.53
beſch
-0.52
})->
-0.52
Airborne
-0.51
POSITIVE LOGITS
toxicity
2.56
toxicity
0.89
toxic
0.76
Toxicity
0.75
Toxicity
0.73
xicity
0.71
TOXIC
0.59
TOXIC
0.56
toxic
0.54
httphttps
0.52
Activations Density 0.041%