INDEX
    Explanations

    references to the concept of "objectivity" and related philosophical terms

    New Auto-Interp
    Negative Logits
    aber
    -0.16
    izu
    -0.15
    uen
    -0.15
    avigator
    -0.15
    agra
    -0.14
    ackers
    -0.14
    fm
    -0.14
    inston
    -0.14
    iden
    -0.14
    itational
    -0.14
    POSITIVE LOGITS
    ively
    0.26
    ors
    0.18
    alist
    0.18
    hood
    0.17
    ivity
    0.16
    ives
    0.16
    ually
    0.16
    andalone
    0.15
    ponge
    0.15
     Revolutionary
    0.15
    Act Density 0.070%

    No Known Activations