INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sentenced
    0.98
     emotions
    0.87
     hostility
    0.87
     transgender
    0.87
     collaborated
    0.84
     disagreements
    0.84
     erythe
    0.84
     transnational
    0.84
     hates
    0.84
     monasteries
    0.84
    POSITIVE LOGITS
    用心
    0.70
    <unused713>
    0.70
    0.69
    Uncle
    0.68
    不错的
    0.66
    Email
    0.64
    viewport
    0.63
    0.63
     وسلم
    0.63
     Uncle
    0.63
    Act Density 0.004%

    No Known Activations