INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     LANGUAGE
    -0.06
    itory
    -0.06
    urator
    -0.06
    odb
    -0.06
    -switch
    -0.06
     pot
    -0.06
     Heck
    -0.06
     remodeling
    -0.06
    COMMON
    -0.05
    ATIO
    -0.05
    POSITIVE LOGITS
     Beverly
    0.16
     Gaussian
    0.12
     gaussian
    0.11
     Bever
    0.11
    aussian
    0.09
    .choices
    0.09
     bev
    0.08
    .Constants
    0.07
    .bill
    0.07
     fikir
    0.07
    Act Density 0.003%

    No Known Activations