INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    schild
    -0.71
    Reviewer
    -0.66
    Maker
    -0.66
    tumblr
    -0.65
    20439
    -0.64
    DragonMagazine
    -0.60
     cylinders
    -0.59
    kid
    -0.58
     disks
    -0.58
    beit
    -0.55
    POSITIVE LOGITS
    .,
    1.61
    .?
    1.36
    .;
    1.29
    .:
    1.19
    ./
    1.11
    .,"
    1.11
    .—
    1.10
    .),
    0.97
    .–
    0.93
    orea
    0.92
    Act Density 0.022%

    No Known Activations