INDEX
Explanations
phrases that involve labeling people or concepts as 'fake', 'dangerous', or derogatory, often utilizing quotes to emphasize these characterizations
New Auto-Interp
Negative Logits
HasFactory
-0.66
دانشنامهٔ
-0.62
ſtand
-0.58
autorytatywna
-0.58
raiſ
-0.56
nahilalakip
-0.52
houſe
-0.51
ⓧ
-0.50
曖昧さ回避
-0.50
httphttps
-0.50
POSITIVE LOGITS
dcterms
0.44
fluoro
0.44
RestTemplate
0.41
賀状
0.36
">:
0.36
gefü
0.35
Коммента
0.35
epit
0.35
unoz
0.34
Kün
0.34
Activations Density 0.605%