INDEX
Explanations
word pairs with repetition for emphasis or accuracy
New Auto-Interp
Negative Logits
orthy
-0.90
uga
-0.84
atis
-0.80
Authors
-0.78
Stage
-0.76
dor
-0.75
onso
-0.74
icans
-0.74
erves
-0.71
esta
-0.70
POSITIVE LOGITS
ratio
1.11
comparisons
1.06
conversion
1.04
ratios
1.00
correspondence
1.00
conversions
0.98
converter
0.91
transmission
0.89
approach
0.88
comparison
0.88
Activations Density 0.145%