INDEX
    Explanations

    phrases that contrast different sides or perspectives on a topic

    New Auto-Interp
    Negative Logits
    theless
    -0.67
    zon
    -0.61
     RELEASE
    -0.60
    Effective
    -0.60
    Progress
    -0.60
    cit
    -0.60
     Updated
    -0.59
    burg
    -0.57
    zan
    -0.56
    anned
    -0.55
    POSITIVE LOGITS
     side
    1.22
    worldly
    1.20
     hemisphere
    0.93
     hand
    0.92
     half
    0.89
    most
    0.88
    iest
    0.83
     Side
    0.83
     end
    0.83
     halves
    0.81
    Act Density 0.629%

    No Known Activations