INDEX
    Explanations

    common, popular, typical

    New Auto-Interp
    Negative Logits
     réellement
    0.98
     véritable
    0.95
     esclusivamente
    0.93
     Truly
    0.91
     Literally
    0.88
     진짜
    0.87
     genuinely
    0.84
    本当に
    0.84
     literalmente
    0.83
     defies
    0.81
    POSITIVE LOGITS
     popular
    1.99
    popular
    1.77
    常用的
    1.76
    Popular
    1.74
     common
    1.73
    常見
    1.72
     commonly
    1.71
    常见的
    1.66
     популяр
    1.61
    常用
    1.58
    Act Density 1.065%

    No Known Activations