INDEX
Explanations
phrases related to establishing norms or examples
phrases relating to setting standards, examples, or precedents
New Auto-Interp
Negative Logits
ividual
-0.69
ugg
-0.66
orge
-0.66
leness
-0.65
jug
-0.63
Sunder
-0.62
jj
-0.59
compr
-0.59
outweigh
-0.59
ér
-0.59
POSITIVE LOGITS
precedent
1.28
tone
1.11
precedence
1.03
tle
1.00
benchmark
0.99
flame
0.98
example
0.94
benchmarks
0.94
preced
0.93
record
0.93
Activations Density 0.061%