INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     coloc
    -0.07
    ادة
    -0.07
     cuda
    -0.07
    _TRANSL
    -0.07
    website
    -0.06
    كات
    -0.06
     Duncan
    -0.06
     була
    -0.06
     wereld
    -0.06
    .compose
    -0.06
    POSITIVE LOGITS
     semaphore
    0.12
     Semaphore
    0.11
    Semaphore
    0.09
    _semaphore
    0.09
    aphore
    0.07
     malware
    0.07
     helicopters
    0.06
    	sem
    0.06
     temper
    0.06
     disagrees
    0.06
    Act Density 0.001%

    No Known Activations