INDEX
    Explanations

    negativity and immunity

    New Auto-Interp
    Negative Logits
     ejemplos
    0.48
     중요한
    0.45
     agricoles
    0.45
     agricole
    0.45
     المالية
    0.44
     belangrijk
    0.43
     الاجتماعية
    0.43
    otherapie
    0.43
     ব্যবসার
    0.42
     литератур
    0.42
    POSITIVE LOGITS
     perimeter
    0.41
     sacrifice
    0.41
     distrust
    0.40
     hate
    0.39
     sacrifices
    0.39
     preventing
    0.38
     hence
    0.38
    ron
    0.38
     enemy
    0.38
     hatred
    0.38
    Act Density 0.004%

    No Known Activations