INDEX
    Explanations

    Math problem solving

    New Auto-Interp
    Negative Logits
     doses
    -0.09
     kabinet
    -0.09
     rhag
    -0.09
    ებს
    -0.09
     gover
    -0.09
     სახლში
    -0.08
    лөг
    -0.08
     ქვეყანაში
    -0.08
     ravim
    -0.08
     نشست
    -0.08
    POSITIVE LOGITS
     posted
    0.09
    帖子
    0.09
    讨论
    0.09
    网友
    0.09
     discussions
    0.08
     impatient
    0.08
    教程
    0.08
     tutorials
    0.08
     debated
    0.08
    .mixin
    0.08
    Act Density 0.005%

    No Known Activations