INDEX
    Explanations

    questions following quotes

    New Auto-Interp
    Negative Logits
     unfortunately
    0.44
     importantly
    0.41
     approximate
    0.41
    と思っています
    0.39
     ಉತ್ತಮ
    0.38
    层次
    0.38
     fortunately
    0.37
    今回は
    0.37
    وامل
    0.37
     approximates
    0.37
    POSITIVE LOGITS
    why
    1.44
     why
    1.41
     Why
    1.38
    Why
    1.32
     WHY
    1.30
     waarom
    1.23
    WHY
    1.21
    为什么
    1.19
    なぜ
    1.15
    为何
    1.11
    Act Density 0.016%

    No Known Activations