INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    0.45
    적인
    0.44
    i
    0.42
    c
    0.41
     articul
    0.40
     automobiles
    0.38
     hospitalization
    0.38
    0.37
     fabricate
    0.36
     transplantation
    0.35
    POSITIVE LOGITS
    ный
    0.59
    :
    0.58
    ના
    0.55
    ка
    0.50
    ের
    0.48
     continuamos
    0.48
    naam
    0.47
    ın
    0.47
    6
    0.47
     teníamos
    0.47
    Act Density 0.177%

    No Known Activations