INDEX
    Explanations

    sentences related to laws, policies, and legal actions

    New Auto-Interp
    Negative Logits
     sleeper
    -0.75
     toy
    -0.68
     wardrobe
    -0.67
     closet
    -0.66
     optional
    -0.66
     ditch
    -0.65
     dummy
    -0.65
     phantom
    -0.64
     silhouette
    -0.64
     juice
    -0.64
    POSITIVE LOGITS
     Their
    0.99
     Though
    0.97
     Previously
    0.96
     Among
    0.95
     Speaking
    0.94
     Essentially
    0.94
     Having
    0.93
     His
    0.93
     Similarly
    0.93
     Whereas
    0.91
    Act Density 0.264%

    No Known Activations