INDEX
Explanations
the abbreviation 'am' followed by a number
New Auto-Interp
Negative Logits
kinderg
-0.63
puberty
-0.63
steroids
-0.62
stakes
-0.60
strawberries
-0.60
flush
-0.60
affairs
-0.59
melts
-0.58
takeaway
-0.58
lift
-0.58
POSITIVE LOGITS
endment
1.31
nesty
1.25
essage
1.11
bitious
1.10
ilies
1.08
sterdam
1.08
azing
1.06
munition
1.05
otor
1.04
icro
1.03
Activations Density 0.030%