INDEX
Explanations
attends to digit tokens from numerical token indices
New Auto-Interp
Head Attr Weights
0:0.15
1:0.31
2:0.17
3:0.06
4:0.08
5:0.06
6:0.06
7:0.07
Negative Logits
帖最后由
-0.31
okuyayım
-0.26
IsContent
-0.24
ID
-0.23
Datuak
-0.23
indiv
-0.23
('/:-0.23
AccessorTable
-0.22
Mohammed
-0.22
Full
-0.22
POSITIVE LOGITS
يكب
0.39
hyrchwyd
0.39
zeitig
0.37
issory
0.37
ujednoznacz
0.36
tomation
0.35
viders
0.35
Enllaços
0.34
opedic
0.33
Petru
0.33
Activations Density 0.380%