INDEX
    Explanations

    newly discovered knowledge

    New Auto-Interp
    Negative Logits
     principals
    0.42
     grueling
    0.40
    ریل
    0.40
     mesta
    0.39
     misfortune
    0.39
     عارف
    0.38
    Entries
    0.38
     displeasure
    0.38
     ils
    0.38
    没什么
    0.38
    POSITIVE LOGITS
     overlooked
    0.49
     идея
    0.46
     unexpected
    0.45
    pointed
    0.45
     clever
    0.44
     surprisingly
    0.44
     unexpectedly
    0.43
     considered
    0.43
     विचार
    0.42
    idea
    0.42
    Act Density 0.018%

    No Known Activations