INDEX
Explanations
phrases indicating a comparison or contrast
phrases beginning with "by," indicating attribution or causation
New Auto-Interp
Negative Logits
ounter
-0.65
ptions
-0.64
digs
-0.63
istan
-0.62
ILCS
-0.61
earable
-0.59
ptive
-0.58
imal
-0.58
stadt
-0.54
ylum
-0.53
POSITIVE LOGITS
products
1.28
virtue
1.16
akuya
1.07
laws
1.03
product
1.02
implication
1.01
catch
0.96
gone
0.96
extension
0.91
default
0.86
Activations Density 0.083%