INDEX
    Explanations

    episode or chapter numbers

    New Auto-Interp
    Negative Logits
    0.50
     Political
    0.46
    ן
    0.46
    0.45
    section
    0.44
    text
    0.44
    Construction
    0.44
    0.43
    一种
    0.42
    0.42
    POSITIVE LOGITS
     drifting
    0.49
     palais
    0.49
     monot
    0.48
     manche
    0.47
     teasing
    0.47
     wel
    0.46
     bay
    0.46
     centr
    0.45
     cré
    0.44
     rég
    0.44
    Act Density 0.001%

    No Known Activations