INDEX
Explanations
death, penalty, threats, core
New Auto-Interp
Negative Logits
Wren
1.00
南京
0.98
broken
0.97
蓣
0.94
aec
0.93
ação
0.90
torn
0.84
massac
0.84
vomit
0.84
anity
0.83
POSITIVE LOGITS
penalty
1.12
penalty
1.05
Penalty
1.04
Penalty
1.03
мозга
1.02
せる
0.98
posaż
0.96
bed
0.92
مة
0.92
缓
0.91
Activations Density 0.089%