INDEX
    Explanations

    statements and claims related to evidence and truthfulness

    New Auto-Interp
    Negative Logits
    ("$.
    -0.46
    Scrolled
    -0.42
    truded
    -0.42
    بوابة
    -0.42
    |/
    -0.41
    -0.41
     Kran
    -0.41
     strpos
    -0.41
    όμε
    -0.40
     '',
    
    -0.40
    POSITIVE LOGITS
    这点
    0.98
    这一点
    0.94
     isso
    0.90
     ذلك
    0.85
     ello
    0.84
    Such
    0.83
     bunu
    0.83
     nisso
    0.82
     этого
    0.82
     hierfür
    0.81
    Act Density 0.681%

    No Known Activations