INDEX
Explanations
instances of comparisons suggesting a decrease or reduction in magnitude
phrases indicating a comparison or measure of quantity
New Auto-Interp
Negative Logits
TRY
-0.69
Reconstruction
-0.66
Origins
-0.66
DD
-0.61
AE
-0.61
den
-0.61
âĹ¼
-0.60
kamp
-0.57
DK
-0.57
RL
-0.57
POSITIVE LOGITS
ened
0.98
than
0.96
ening
0.87
thumbnails
0.82
Than
0.75
ainers
0.72
expensive
0.72
ons
0.70
ensive
0.69
fortunate
0.69
Activations Density 0.032%