INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ة
    0.48
    蛋糕
    0.46
    Laundry
    0.45
     деньги
    0.44
    ן
    0.44
    یز
    0.43
     Pudding
    0.42
    0.42
    Church
    0.41
    nothing
    0.41
    POSITIVE LOGITS
     comics
    0.53
     comic
    0.52
     approving
    0.52
     Comic
    0.49
     attenuation
    0.43
     GPUs
    0.42
     Comics
    0.38
     approvals
    0.38
     suave
    0.37
    isComposite
    0.37
    Act Density 0.002%

    No Known Activations