INDEX
Explanations
attends to review and assess-related tokens from evidence or information-related tokens
New Auto-Interp
Head Attr Weights
0:0.09
1:0.11
2:0.11
3:0.08
4:0.06
5:0.02
6:0.13
7:0.36
Negative Logits
cre
-0.23
SPIRE
-0.23
–
-0.22
filepath
-0.21
bu
-0.21
έ
-0.21
LUMP
-0.20
du
-0.20
Hover
-0.20
des
-0.20
POSITIVE LOGITS
Efq
0.45
houſe
0.44
Monfieur
0.43
myſelf
0.42
himſelf
0.42
themſelves
0.41
ſever
0.41
ſeveral
0.41
intios
0.40
ſelf
0.40
Activations Density 0.159%