INDEX
    Explanations

    abstraction and representation

    New Auto-Interp
    Negative Logits
    σταν
    0.45
     unethical
    0.43
     तकनीकी
    0.42
    aini
    0.42
     technischen
    0.41
     modernization
    0.40
     friendliness
    0.40
    μφωνα
    0.40
    पास
    0.39
     spécifiques
    0.39
    POSITIVE LOGITS
     cognitive
    1.07
     Cognitive
    1.00
    Cogn
    0.98
    cognitive
    0.93
     cognition
    0.91
     cognit
    0.88
     mental
    0.86
     Reasoning
    0.80
     cognitiva
    0.80
     Mental
    0.80
    Act Density 0.042%

    No Known Activations