INDEX
Explanations
references to significant changes or reforms
New Auto-Interp
Negative Logits
spyOn
-0.41
cerol
-0.40
acamata
-0.39
playing
-0.38
Adler
-0.38
matmul
-0.38
iligt
-0.38
eder
-0.37
son
-0.37
instances
-0.37
POSITIVE LOGITS
Changes
1.32
changes
1.30
Changes
1.30
changes
1.28
CHANGES
1.20
change
1.20
CHANGES
1.13
Change
1.11
changement
1.09
cambios
1.08
Activations Density 0.459%