INDEX
Explanations
numerical values and their associated units of measurement
New Auto-Interp
Negative Logits
831
-0.17
loat
-0.16
egral
-0.16
94
-0.14
arsi
-0.14
enne
-0.14
ats
-0.14
pom
-0.14
exactly
-0.14
977
-0.13
POSITIVE LOGITS
350
0.34
150
0.30
800
0.28
250
0.28
300
0.28
450
0.27
400
0.27
750
0.26
700
0.26
120
0.26
Activations Density 0.224%