INDEX
    Explanations

    expressions of honesty or truthfulness

    New Auto-Interp
    Negative Logits
    docx
    -0.60
    Попис
    -0.53
    homonymie
    -0.53
    avajillas
    -0.52
    mea
    -0.52
     Ma
    -0.50
    mul
    -0.50
    Ma
    -0.50
    成了
    -0.49
     ordine
    -0.49
    POSITIVE LOGITS
    \{\\
    0.94
     practically
    0.90
    basically
    0.76
     myſelf
    0.76
    __":
    
    0.75
     EClass
    0.74
     contextLoads
    0.73
    assertTrue
    0.73
    Basically
    0.72
     Honestly
    0.72
    Act Density 0.076%

    No Known Activations