INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     이전
    0.64
     назвал
    0.60
    造成的
    0.59
     Virginia
    0.58
     इसने
    0.57
     Manitoba
    0.55
     Korean
    0.55
     Alabama
    0.55
    📏
    0.54
    用的
    0.54
    POSITIVE LOGITS
     while
    1.06
     mientras
    0.98
    while
    0.98
     enquanto
    0.98
     sambil
    0.95
     WHILE
    0.94
     mentre
    0.90
     where
    0.86
     terwijl
    0.86
     där
    0.83
    Act Density 0.000%

    No Known Activations