INDEX
Explanations
the word "couldn't" with high activation values
instances of the word "couldn't."
New Auto-Interp
Negative Logits
oak
-0.69
dress
-0.67
protected
-0.65
liberated
-0.62
backer
-0.62
croft
-0.61
ULT
-0.60
PU
-0.60
deposition
-0.58
otype
-0.58
POSITIVE LOGITS
't
1.51
adian
0.96
afford
0.91
atio
0.90
kered
0.90
ayan
0.85
anke
0.84
feas
0.84
ilater
0.80
ieve
0.79
Activations Density 0.012%