INDEX
    Explanations

    challenging assumptions or ideas

    New Auto-Interp
    Negative Logits
    መል
    0.45
     squadra
    0.41
    ದ್ದರಿಂದ
    0.41
     rozpozn
    0.40
     スク
    0.40
    ккей
    0.40
     envisions
    0.40
    0.40
    चीत
    0.40
     해결
    0.40
    POSITIVE LOGITS
     validity
    0.58
     assumptions
    0.57
     questioning
    0.56
     notion
    0.55
     abuse
    0.52
     excessive
    0.51
     unjust
    0.51
    质疑
    0.51
     tyranny
    0.49
     hegemony
    0.47
    Act Density 0.051%

    No Known Activations