INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    不允许
    0.62
     asla
    0.59
    nicht
    0.58
     nejen
    0.57
     unnecessary
    0.57
     прекрасно
    0.57
    ine
    0.57
     unnecessarily
    0.55
    Gew
    0.55
    Waar
    0.55
    POSITIVE LOGITS
    正式
    0.86
     основных
    0.84
     formal
    0.75
     основные
    0.75
     formalized
    0.75
     formally
    0.72
    主要的
    0.71
    vinced
    0.70
     주요
    0.69
     основным
    0.68
    Act Density 0.424%

    No Known Activations