INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ডির
    0.46
    0.42
     තම
    0.40
     imitate
    0.40
    0.39
    μοί
    0.38
    仿
    0.38
     unas
    0.37
    0.36
     mimic
    0.36
    POSITIVE LOGITS
    Reasons
    0.44
     çünkü
    0.41
     reasons
    0.39
    ologne
    0.39
    0.38
     because
    0.37
    VJ
    0.37
     Reasons
    0.36
    carouselExample
    0.36
     threefold
    0.36
    Act Density 0.008%

    No Known Activations