INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     blob
    -0.85
     hotels
    -0.77
     كومونز
    -0.76
     hotel
    -0.75
     متعلقه
    -0.75
    -0.73
    aarrggbb
    -0.73
     blobs
    -0.69
    didReceive
    -0.67
    يكب
    -0.66
    POSITIVE LOGITS
    p
    0.54
    g
    0.53
    o
    0.52
    us
    0.50
    self
    0.49
    Self
    0.47
    mstyle
    0.47
    ty
    0.47
    catch
    0.46
     italic
    0.46
    Act Density 0.008%

    No Known Activations