INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    r
    0.54
    动作
    0.50
    ната
    0.49
    見える
    0.48
    elijk
    0.46
    eração
    0.45
     görünt
    0.45
     supérieure
    0.45
     الموجود
    0.45
     blo
    0.44
    POSITIVE LOGITS
    0.73
     bagi
    0.68
     FOR
    0.64
    แรม
    0.64
    determining
    0.63
     アイス
    0.63
    0.61
    0.61
    duire
    0.61
    ль
    0.60
    Act Density 0.026%

    No Known Activations