INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     co
    -2.09
     Co
    -1.48
    co
    -1.29
    Co
    -1.13
     cof
    -0.79
    ly
    -0.78
    py
    -0.63
    m
    -0.62
    na
    -0.60
    d
    -0.58
    POSITIVE LOGITS
     purpoſe
    0.88
    antaranya
    0.84
     pleaſure
    0.82
     houſe
    0.78
     وتسجيلات
    0.75
     leaſt
    0.75
     Monfieur
    0.73
     Efq
    0.73
     enfans
    0.73
     Reſ
    0.71
    Act Density 0.091%

    No Known Activations