INDEX
    Explanations

    sociology and anthropology

    New Auto-Interp
    Negative Logits
     on
    0.47
    0.46
     h
    0.46
     sexu
    0.46
     S
    0.44
    0.42
     air
    0.42
     (
    0.41
     glyph
    0.40
     og
    0.40
    POSITIVE LOGITS
    truths
    0.55
    ма
    0.51
    o
    0.48
    0.45
    remeno
    0.43
    e
    0.43
    Pred
    0.43
    that
    0.43
    지만
    0.43
    Buying
    0.43
    Act Density 0.000%

    No Known Activations