INDEX
    Explanations

    phrases indicating uncertainty or questioning established perceptions

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.27
    3:0.13
    4:0.10
    5:0.04
    6:0.11
    7:0.03
    8:0.04
    9:0.09
    10:0.07
    11:0.03
    Negative Logits
     Chase
    -1.50
     Scare
    -1.37
     Argon
    -1.36
     Haas
    -1.36
     Drivers
    -1.34
     alike
    -1.33
     Nico
    -1.32
     Mono
    -1.32
     Karin
    -1.31
     Brid
    -1.30
    POSITIVE LOGITS
    "]=>
    2.11
     sqor
    1.98
     裏�
    1.90
    EStream
    1.72
    manuel
    1.67
    ゼウス
    1.65
    inion
    1.61
    obyl
    1.61
    factor
    1.60
    culus
    1.58
    Act Density 0.031%

    No Known Activations