INDEX
Explanations
Examples leading to negative outcomes or results
New Auto-Interp
Negative Logits
or
1.16
hoặc
0.91
atau
0.89
ou
0.89
или
0.88
nebo
0.86
أو
0.83
یا
0.79
或者
0.79
oder
0.78
POSITIVE LOGITS
crumbling
0.93
evil
0.89
traitor
0.87
న్ను
0.86
deteriorating
0.86
expiring
0.86
Hitler
0.85
dubious
0.84
Objection
0.83
Remove
0.83
Activations Density 0.470%