INDEX
    Explanations

    overview of features and history

    New Auto-Interp
    Negative Logits
    这将
    0.44
     directe
    0.42
    Clearly
    0.40
    CLEAR
    0.38
    !!!!!!!!!!!!!!!!
    0.38
    !!!!
    0.37
     Clearly
    0.37
    Explicit
    0.36
    明确
    0.36
     instantiated
    0.36
    POSITIVE LOGITS
     significance
    1.05
     Significance
    1.00
     notable
    0.91
    特点
    0.85
     origins
    0.84
    ificance
    0.83
     controversy
    0.83
     Facts
    0.83
     history
    0.82
     Notable
    0.81
    Act Density 0.140%

    No Known Activations