INDEX
Explanations
instances of the word "told."
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.08
3:0.07
4:0.08
5:0.08
6:0.08
7:0.07
8:0.08
9:0.09
10:0.07
11:0.09
Negative Logits
onto
-1.49
usc
-1.40
sidew
-1.38
Accessory
-1.38
slots
-1.37
UFC
-1.36
slug
-1.34
formations
-1.33
corresponding
-1.29
Ich
-1.28
POSITIVE LOGITS
entious
1.82
ר
1.57
ocalyptic
1.54
iasco
1.52
soDeliveryDate
1.50
cipled
1.50
sexism
1.49
zos
1.47
̶
1.46
hindsight
1.43
Activations Density 0.000%