INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     volv
    0.72
    0.70
     tilbage
    0.69
     tredje
    0.68
     tilbake
    0.68
     contemplative
    0.68
     vuelto
    0.66
     volvió
    0.66
     voltou
    0.65
    0.65
    POSITIVE LOGITS
     initialize
    3.26
     initializing
    3.11
     Initialize
    3.11
     initialization
    3.10
     initializes
    3.04
    初始化
    3.02
    Initialize
    2.94
     Initialization
    2.88
    initialize
    2.88
     初始化
    2.77
    Act Density 0.762%

    No Known Activations