INDEX
    Explanations

    entertainment

    New Auto-Interp
    Negative Logits
    -0.08
    Scaling
    -0.07
     plane
    -0.07
    chs
    -0.07
    Analy
    -0.07
    quee
    -0.07
    Ra
    -0.07
     measuring
    -0.07
    Report
    -0.06
    ible
    -0.06
    POSITIVE LOGITS
     Entertainment
    0.15
     entertainment
    0.14
    ertainment
    0.08
     timeval
    0.07
     розпов
    0.07
    _Tis
    0.07
     انت
    0.07
     entert
    0.07
    onet
    0.07
     Restaurants
    0.07
    Act Density 0.005%

    No Known Activations