INDEX
Explanations
unconventional or unpopular
New Auto-Interp
Negative Logits
reliably
0.79
fiable
0.71
reliable
0.68
можно
0.65
確実に
0.65
aider
0.63
しっかり
0.62
สามารถ
0.58
健全
0.58
można
0.58
POSITIVE LOGITS
unorthodox
1.25
unconventional
1.22
imperfect
1.00
unpopular
1.00
Controvers
1.00
controversial
0.97
Sometimes
0.92
Sometimes
0.90
Difficult
0.88
sometimes
0.86
Activations Density 0.001%