INDEX
Explanations
numerical values, particularly focusing on the number 90
New Auto-Interp
Negative Logits
seeing
-0.83
lay
-0.77
untu
-0.72
wan
-0.72
-0.72
think
-0.70
cles
-0.70
lish
-0.68
wered
-0.66
RTX
-0.64
POSITIVE LOGITS
%"
1.03
%
0.98
percent
0.98
%:
0.94
ILCS
0.94
°
0.94
º
0.93
degrees
0.92
mph
0.88
%;
0.86
Activations Density 0.031%