INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Really
    0.42
     якобы
    0.42
     approximately
    0.41
    사실
    0.41
     প্রকৃতপক্ষে
    0.40
    ?
    0.40
     although
    0.39
    実に
    0.39
    σ
    0.39
    Ultimately
    0.38
    POSITIVE LOGITS
     definitely
    0.60
     Definitely
    0.50
    !”
    0.50
     llena
    0.49
     worthwhile
    0.48
    Definitely
    0.47
     préférable
    0.45
    !!”
    0.45
     definitivamente
    0.44
    !)
    0.44
    Act Density 0.004%

    No Known Activations