INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Formatting
    -0.08
     Brendan
    -0.07
    _face
    -0.06
    atisfaction
    -0.06
     odst
    -0.06
    eln
    -0.06
     katıl
    -0.06
     conduc
    -0.06
     сфері
    -0.06
     Mormons
    -0.06
    POSITIVE LOGITS
     dinner
    0.09
    0.08
     supper
    0.07
     happier
    0.07
     Dinner
    0.07
     brunch
    0.07
    !',↵
    0.07
    icer
    0.07
    .GroupBox
    0.06
    																	
    0.06
    Act Density 0.019%

    No Known Activations