INDEX
    Explanations

    references to uncovering hidden truths or secrets

    New Auto-Interp
    Negative Logits
    åѤ
    -0.15
    ë´ī
    -0.15
    å¡ļ
    -0.15
    acus
    -0.14
    çī¹èī²
    -0.14
    ÎŃÏģγ
    -0.14
    æŀĿ
    -0.13
    ronic
    -0.13
    itten
    -0.13
    Chance
    -0.13
    POSITIVE LOGITS
     truth
    0.64
    truth
    0.52
     Truth
    0.50
     truths
    0.49
     secrets
    0.49
     true
    0.48
    Truth
    0.45
     verdad
    0.41
     secret
    0.41
    true
    0.38
    Act Density 0.226%

    No Known Activations