INDEX

Explanations

numerical data, toxicity

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ਪ

-0.08

Popularity

-0.08

 popularity

-0.08

 sogenannten

-0.07

گران

-0.07

egal

-0.07

 sogenannte

-0.07

ులతో

-0.07

 cheer

-0.07

ARRY

-0.07

POSITIVE LOGITS

 составляет

0.12

≈

0.10

�

0.10

¥

0.10

是多少

0.09

 yaklaşık

0.09

为

0.09

 około

0.09

 approx

0.09

 लगभग

0.09

Activations Density 0.032%