INDEX
    Explanations

    rules, specific detail, vary significantly

    New Auto-Interp
    Negative Logits
    0.48
    ဆင့်
    0.48
    ურთი
    0.48
    riente
    0.45
     Enhancement
    0.45
    rion
    0.45
    clouds
    0.44
    Θ
    0.43
    <bos>
    0.43
     பிள்ளை
    0.43
    POSITIVE LOGITS
     penalized
    0.52
     relieved
    0.51
     prioritized
    0.51
     ограничен
    0.51
    тана
    0.50
     taken
    0.50
     oficiais
    0.48
     depressed
    0.47
     голов
    0.47
     backed
    0.46
    Act Density 0.000%

    No Known Activations