INDEX
Explanations
conditional thinking
expressions that denote close, emotionally intimate relationships or strong personal bonds between people.
graphic or explicit descriptions of cannibalism (eating human flesh) or similarly gruesome content.
New Auto-Interp
Negative Logits
罹
0.33
triumphant
0.30
murderous
0.30
elegan
0.29
auster
0.29
شاه
0.28
katalog
0.28
produkt
0.28
auspices
0.28
fondamentali
0.27
POSITIVE LOGITS
如果
0.39
Pokud
0.34
Nếu
0.33
Pokud
0.33
μπορεί
0.33
if
0.32
যদি
0.32
moze
0.32
можуть
0.32
যদি
0.31
Activations Density 0.098%