INDEX
    Explanations

    references to scale or measurements in various contexts

    New Auto-Interp
    Negative Logits
    <bos>
    -2.69
    <?
    -0.65
    
    
    -0.61
     admit
    -0.60
    -0.59
     put
    -0.58
    #
    -0.56
     defend
    -0.56
    cup
    -0.56
     cố
    -0.54
    POSITIVE LOGITS
     accla
    1.53
     suspic
    1.50
     ecru
    1.47
     fatis
    1.43
     jaya
    1.40
     wien
    1.40
     effe
    1.40
     unwarran
    1.39
     bandung
    1.39
     nece
    1.37
    Act Density 0.048%

    No Known Activations