INDEX
Explanations
numerical values related to significant measurements or quantities
New Auto-Interp
Negative Logits
able
-0.17
ands
-0.15
ctrine
-0.15
ering
-0.14
edImage
-0.14
iene
-0.14
ucch
-0.14
تÙħر
-0.14
bere
-0.13
ront
-0.13
POSITIVE LOGITS
0
0.39
00
0.29
8
0.29
9
0.29
7
0.28
5
0.27
6
0.27
4
0.26
3
0.26
2
0.24
Activations Density 0.138%