INDEX
Explanations
attends to numeric values associated with certain features from specific tokens related to features or references
New Auto-Interp
Head Attr Weights
0:0.07
1:0.10
2:0.14
3:0.09
4:0.07
5:0.05
6:0.22
7:0.22
Negative Logits
expandindo
-0.58
دانشنامهٔ
-0.52
مرئيه
-0.41
aarrggbb
-0.38
abestanden
-0.37
RetentionPolicy
-0.37
bcryptjs
-0.36
PathVariable
-0.36
elemField
-0.35
createCell
-0.35
POSITIVE LOGITS
putin
0.34
SequentialGroup
0.31
REP
0.30
onalds
0.28
Warszawa
0.28
idf
0.28
PrintWriter
0.28
PhysRevD
0.28
哒
0.28
goog
0.27
Activations Density 0.002%