INDEX
    Explanations

    comfortable

    New Auto-Interp
    Negative Logits
     noisy
    -0.07
     enormous
    -0.06
     SizedBox
    -0.06
     ваг
    -0.06
     Christmas
    -0.06
     Eighth
    -0.06
     disastrous
    -0.06
    лишком
    -0.06
    	show
    -0.06
    "]]↵
    -0.06
    POSITIVE LOGITS
     locals
    0.07
    EZ
    0.06
    eton
    0.06
    ufreq
    0.06
    valu
    0.06
    Ab
    0.06
     realizing
    0.06
    presence
    0.06
    Oregon
    0.06
    ισμ
    0.06
    Act Density 0.021%

    No Known Activations