INDEX
Explanations
the word "highest" with varying activations
references to the concept of "highest" in various contexts
New Auto-Interp
Negative Logits
agra
-0.70
icer
-0.63
neg
-0.63
redo
-0.63
ca
-0.62
kas
-0.60
neys
-0.60
vous
-0.60
bal
-0.60
md
-0.59
POSITIVE LOGITS
bidder
1.01
hest
0.85
pinnacle
0.83
concentrations
0.82
rated
0.82
extent
0.82
imaginable
0.81
proport
0.77
possible
0.77
practicable
0.77
Activations Density 0.014%