INDEX
    Explanations

    instances of uncertainty or negation

    New Auto-Interp
    Negative Logits
     Houſe
    -0.67
    ValueStyle
    -0.61
     houſe
    -0.57
     preſent
    -0.57
     purpoſe
    -0.55
    RegressionTest
    -0.54
    leſs
    -0.54
     AssemblyTitle
    -0.52
    wiſe
    -0.52
    ſelves
    -0.51
    POSITIVE LOGITS
     need
    0.67
     have
    0.63
     get
    0.57
     DID
    0.55
     does
    0.54
     Does
    0.54
     do
    0.53
    0.52
     give
    0.52
     did
    0.52
    Act Density 0.142%

    No Known Activations