INDEX
Explanations
comparisons of quantity between different scenarios
phrases that indicate cause and effect relationships
New Auto-Interp
Negative Logits
anka
-0.72
osate
-0.61
laim
-0.57
usp
-0.57
alach
-0.55
Origin
-0.55
stros
-0.54
iasco
-0.54
bara
-0.54
ossus
-0.54
POSITIVE LOGITS
fewer
1.75
less
1.59
clearer
1.54
nicer
1.53
quicker
1.52
sharper
1.50
richer
1.49
shorter
1.47
easier
1.46
smoother
1.46
Activations Density 0.895%