INDEX
    Explanations

    unconventional or unpopular

    New Auto-Interp
    Negative Logits
     reliably
    0.79
     fiable
    0.71
     reliable
    0.68
     можно
    0.65
    確実に
    0.65
    aider
    0.63
     しっかり
    0.62
     สามารถ
    0.58
    健全
    0.58
     można
    0.58
    POSITIVE LOGITS
     unorthodox
    1.25
     unconventional
    1.22
     imperfect
    1.00
     unpopular
    1.00
     Controvers
    1.00
     controversial
    0.97
    Sometimes
    0.92
     Sometimes
    0.90
     Difficult
    0.88
     sometimes
    0.86
    Act Density 0.001%

    No Known Activations