INDEX
    Explanations

    statements expressing moral judgment or inconsistency over time

    New Auto-Interp
    Negative Logits
    ÅĦ
    -0.16
    аÑĢÑĩ
    -0.15
    erte
    -0.15
    erer
    -0.14
    brero
    -0.14
    ivol
    -0.14
    ãĥ¼ãĥ³
    -0.14
    eria
    -0.14
    اÙĪÙĬ
    -0.14
    μη
    -0.14
    POSITIVE LOGITS
    ä»Ĭ
    0.19
     continue
    0.18
    _now
    0.17
     current
    0.17
     today
    0.17
     continues
    0.17
    today
    0.16
     ä»Ĭ
    0.16
    .now
    0.16
    current
    0.16
    Act Density 0.145%

    No Known Activations