INDEX
Explanations
phrases indicating suggestions or implications
suggestive phrases that indicate reasoning or conclusion
New Auto-Interp
Negative Logits
fare
-0.75
uss
-0.69
EEE
-0.68
presided
-0.67
ppa
-0.63
skill
-0.61
gie
-0.60
fought
-0.60
FO
-0.60
imb
-0.59
POSITIVE LOGITS
ively
0.78
indications
0.76
indicators
0.71
evidence
0.70
uggest
0.70
unct
0.70
kinson
0.69
ered
0.68
ausible
0.68
ifiable
0.67
Activations Density 0.068%