INDEX
Explanations
problematic and offensive terms
New Auto-Interp
Negative Logits
каждой
0.51
Each
0.44
EACH
0.40
வ
0.40
擾
0.39
each
0.39
}=(-
0.39
Each
0.38
collectors
0.38
ко
0.37
POSITIVE LOGITS
been
0.46
spearheaded
0.46
been
0.45
করেছে
0.44
rebranded
0.43
teknologi
0.41
www
0.41
தற்போது
0.40
နောက်
0.40
Blvd
0.39
Activations Density 0.004%