INDEX
    Explanations

    phrases indicating the significance or importance of various subjects or ideas

    New Auto-Interp
    Negative Logits
    anzi
    -0.14
    eref
    -0.13
    ắt
    -0.13
    ¶Į
    -0.13
     mê
    -0.13
    ế
    -0.13
    ätz
    -0.12
    abwe
    -0.12
    alg
    -0.12
    аннÑı
    -0.12
    POSITIVE LOGITS
     importance
    0.48
     need
    0.41
     necessity
    0.35
     significance
    0.33
     value
    0.33
     Importance
    0.33
     dangers
    0.31
    need
    0.31
    import
    0.31
     centr
    0.30
    Act Density 0.187%

    No Known Activations