INDEX
    Explanations

    abbreviation, full form, or device

    New Auto-Interp
    Negative Logits
     butternut
    0.84
    0.83
     backpack
    0.82
     astrophys
    0.80
     aneu
    0.80
     astron
    0.79
     sprinting
    0.79
     pupp
    0.79
     cucumbers
    0.78
     creatinine
    0.77
    POSITIVE LOGITS
    сный
    0.80
    льные
    0.78
    товые
    0.77
    Semaphore
    0.76
     يلي
    0.75
    ly
    0.74
    ны
    0.73
     Funktions
    0.72
    ísticas
    0.71
    oriented
    0.71
    Act Density 0.001%

    No Known Activations