INDEX
    Explanations

    explanation: contrast, comparison, metaphor, assumption

    New Auto-Interp
    Negative Logits
    各类
    0.73
    Apps
    0.64
    各种
    0.64
     современных
    0.64
    各種
    0.61
    bagai
    0.60
    CRUD
    0.60
     třeba
    0.59
     các
    0.59
     dalších
    0.59
    POSITIVE LOGITS
     namely
    0.66
     namelijk
    0.64
     involunt
    0.62
     undermined
    0.61
     undermines
    0.61
     elicited
    0.60
     falsehood
    0.59
     undermine
    0.59
    和一个
    0.58
     pretended
    0.57
    Act Density 0.004%

    No Known Activations