INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    such
    0.58
    Σ
    0.52
    a
    0.51
    f
    0.48
    society
    0.48
     Housewives
    0.48
    something
    0.47
    ν
    0.47
    الت
    0.46
    Fog
    0.46
    POSITIVE LOGITS
     thumb
    0.79
     pouce
    0.71
     thumbs
    0.68
     Thumb
    0.66
     ใช่
    0.55
    Thumb
    0.54
     grü
    0.52
     thump
    0.52
    0.51
     liking
    0.51
    Act Density 0.004%

    No Known Activations