INDEX
    Explanations

    qualifications after 'though'

    New Auto-Interp
    Negative Logits
    0.50
    0.44
    (\
    0.43
    0.42
    (
    0.42
    。「
    0.41
    0.40
    0.39
    	
    0.38
    ̣
    0.38
    POSITIVE LOGITS
    !).
    0.82
    !),
    0.75
    !)
    0.73
    !!)
    0.68
     übrigens
    0.61
     admittedly
    0.60
    ?).
    0.59
    ...).
    0.59
    !);
    0.59
     되겠죠
    0.59
    Act Density 0.140%

    No Known Activations