INDEX
Explanations
attends to specific tokens that signal larger concepts or ideas from more general tokens
New Auto-Interp
Head Attr Weights
0:0.09
1:0.11
2:0.10
3:0.07
4:0.07
5:0.02
6:0.16
7:0.34
Negative Logits
resourceCulture
-0.30
Hentet
-0.26
expandindo
-0.24
ExecuteAsync
-0.24
فريبيس
-0.22
형
-0.22
taj
-0.22
quanto
-0.22
intur
-0.21
ushort
-0.21
POSITIVE LOGITS
+:+
0.37
HasAnnotation
0.32
COUVER
0.31
Referencies
0.31
validamos
0.30
utilisons
0.30
Espèce
0.29
HasFactory
0.29
nonatomic
0.29
betweenstory
0.29
Activations Density 0.834%