INDEX
Explanations
phrases indicating important consequences or impacts
statements about consequences or effects
New Auto-Interp
Negative Logits
fighters
-0.75
MQ
-0.72
cop
-0.70
cker
-0.68
fred
-0.68
bug
-0.66
few
-0.66
ced
-0.66
bows
-0.66
VERSION
-0.65
POSITIVE LOGITS
implications
0.89
beyond
0.84
romeda
0.82
ramifications
0.81
ogene
0.76
uality
0.71
arising
0.71
afety
0.69
ripple
0.68
consequential
0.67
Activations Density 0.053%