INDEX
Explanations
references to different weights or weight-related terms
references to weight in various contexts
New Auto-Interp
Negative Logits
ICLE
-0.84
BLE
-0.82
VIS
-0.73
Noir
-0.73
Mb
-0.73
Suc
-0.72
xon
-0.70
Kidd
-0.69
UTE
-0.69
Gutenberg
-0.68
POSITIVE LOGITS
weights
1.05
lifting
1.03
weight
0.98
weight
0.96
heaviest
0.93
weights
0.92
heavier
0.89
burdens
0.89
iless
0.87
lifting
0.86
Activations Density 0.021%