INDEX
    Explanations

    sentences relating to actions and activities being done by people

    dialogue and interactions between characters

    New Auto-Interp
    Negative Logits
    ortium
    -0.60
    ador
    -0.59
     disadvantages
    -0.59
    -)
    -0.58
    andra
    -0.56
     hindsight
    -0.55
     centr
    -0.54
     disadvantage
    -0.54
     disagrees
    -0.54
     nowadays
    -0.54
    POSITIVE LOGITS
    FIR
    0.64
     accordingly
    0.64
     proceeded
    0.60
     prest
    0.60
    SPONSORED
    0.59
    oka
    0.59
    ãĤ©
    0.59
    ãĥŃ
    0.58
    hid
    0.57
    hello
    0.56
    Act Density 0.655%

    No Known Activations