INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     اكيد
    0.49
    แน่นอน
    0.48
     tentunya
    0.46
     évidemment
    0.46
     sicherlich
    0.43
     obviously
    0.43
    겠지만
    0.42
     sicuramente
    0.42
     নিঃসন্দেহে
    0.42
     oczywiście
    0.41
    POSITIVE LOGITS
     surprisingly
    1.11
    Surprisingly
    0.95
     Surprisingly
    0.91
    竟然
    0.88
    surprisingly
    0.86
     actually
    0.81
    居然
    0.79
    actually
    0.79
     oddly
    0.78
     strangely
    0.76
    Act Density 0.075%

    No Known Activations