INDEX
Explanations
flawless, pristine, spotless
New Auto-Interp
Negative Logits
coarser
0.96
inadequ
0.89
唆
0.83
louder
0.82
mauvais
0.80
сексуа
0.80
divisive
0.80
διαφορε
0.79
weaker
0.77
纣
0.76
POSITIVE LOGITS
flawless
1.80
pristine
1.76
spotless
1.73
perfection
1.56
Perfection
1.56
immaculate
1.56
perfect
1.48
Perfect
1.44
perfek
1.38
perfetta
1.37
Activations Density 0.236%