INDEX
    Explanations

    translating language to another

    New Auto-Interp
    Negative Logits
    VCT
    0.36
    很容易
    0.35
    0.35
     kart
    0.34
     পড়েছে
    0.34
    ldi
    0.33
    ikken
    0.33
     youngest
    0.33
    kart
    0.33
    ہر
    0.32
    POSITIVE LOGITS
     behavior
    0.44
    性能
    0.44
     성능
    0.41
     performance
    0.40
     Behavior
    0.40
     Tether
    0.40
     specificity
    0.40
     ability
    0.39
     flattening
    0.39
    ̻
    0.39
    Act Density 0.001%

    No Known Activations