INDEX
Explanations
pieces of text referring to attention and different associated contexts
variations of the word "Attention."
New Auto-Interp
Negative Logits
20439
-0.80
FTWARE
-0.73
ãĥīãĥ©ãĤ´ãĥ³
-0.70
çĦ
-0.69
REDACTED
-0.68
Rwanda
-0.67
å§«
-0.65
76561
-0.65
Bene
-0.63
sliding
-0.63
POSITIVE LOGITS
etic
1.09
ention
1.08
anooga
1.05
ension
1.04
ert
1.03
ributes
1.00
ributed
0.99
oir
0.98
anas
0.97
assin
0.97
Activations Density 0.006%