INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     marginal
    -0.07
     spraw
    -0.06
    пе
    -0.06
     railway
    -0.06
     žen
    -0.06
     gears
    -0.06
    (serializers
    -0.06
    nek
    -0.06
    writes
    -0.06
    ieg
    -0.06
    POSITIVE LOGITS
    fusion
    0.13
    -General
    0.07
    φι
    0.07
     perf
    0.07
     Verde
    0.07
    perse
    0.06
     surgeons
    0.06
    DOWNLOAD
    0.06
     Vehicle
    0.06
    dives
    0.06
    Act Density 0.004%

    No Known Activations