INDEX
Explanations
attends to tokens related to specific characters and structures from arbitrary tokens
New Auto-Interp
Head Attr Weights
0:0.09
1:0.10
2:0.12
3:0.14
4:0.10
5:0.06
6:0.21
7:0.14
Negative Logits
us
-0.28
Hochspringen
-0.27
-0.23
e
-0.23
E
-0.23
am
-0.22
…
-0.21
میان
-0.21
sik
-0.20
CloseOperation
-0.20
POSITIVE LOGITS
wieś
0.40
Ressource
0.38
myſelf
0.35
ειτουργ
0.34
becauſe
0.33
caufe
0.32
whoſe
0.32
poffible
0.32
bezeichneter
0.32
+#+#
0.32
Activations Density 0.000%