INDEX
Explanations
verbs that indicate causation or influence
New Auto-Interp
Negative Logits
anooga
-0.59
fortun
-0.53
lett
-0.44
oret
-0.41
resy
-0.41
jon
-0.41
FX
-0.41
Prediction
-0.41
fucked
-0.40
Legions
-0.40
POSITIVE LOGITS
stumble
0.70
pursue
0.67
realize
0.64
ponder
0.62
ãĤ©
0.60
rethink
0.60
contemplate
0.60
reconsider
0.60
othy
0.60
conclude
0.60
Activations Density 9.699%