INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     définiti
    -0.69
    AsStream
    -0.59
    zellen
    -0.58
    borgen
    -0.57
     addSubview
    -0.56
    autogui
    -0.55
     Konink
    -0.54
     dilan
    -0.54
     بيها
    -0.54
    ksom
    -0.54
    POSITIVE LOGITS
     ]
    
    1.11
    "]);
    
    1.09
    "])
    
    0.97
    )";
    
    0.97
    '))
    
    0.97
    ']))
    
    0.96
    ".
    
    0.94
    }")
    
    0.94
    )");
    
    0.94
    )"),
    0.91
    Act Density 1.578%

    No Known Activations