INDEX
    Explanations

    phrases that indicate hypothetical or conditional scenarios

    New Auto-Interp
    Negative Logits
    IDE
    -0.70
    north
    -0.69
     Beaver
    -0.69
    inka
    -0.68
    hin
    -0.65
    anka
    -0.65
     Bere
    -0.65
    fman
    -0.63
    Balt
    -0.63
    pillar
    -0.62
    POSITIVE LOGITS
     sounds
    0.85
     removes
    0.74
     causes
    0.74
     leaves
    0.73
     looks
    0.72
     delet
    0.72
     would
    0.72
     fixes
    0.71
     entails
    0.70
     amounts
    0.70
    Act Density 0.270%

    No Known Activations