INDEX
Explanations
attends to European tokens from related guideline tokens
New Auto-Interp
Head Attr Weights
0:0.08
1:0.10
2:0.09
3:0.12
4:0.11
5:0.06
6:0.23
7:0.17
Negative Logits
дописавши
-0.31
InjectMocks
-0.28
Brandenburg
-0.26
ęp
-0.26
も行
-0.26
literals
-0.26
ciuto
-0.25
addElement
-0.25
nand
-0.25
TargetApi
-0.25
POSITIVE LOGITS
principalColumn
0.41
purpoſe
0.35
hunne
0.35
Personensuche
0.34
auffi
0.34
pleaſure
0.34
reaſon
0.34
ſelves
0.33
ſelf
0.33
juſ
0.33
Activations Density 0.494%