INDEX
Explanations
the prefix "ultim-" at various degrees of activation
instances of the suffix "im" indicating the prefix of various words
New Auto-Interp
Negative Logits
lace
-0.67
ĸļ
-0.67
DERR
-0.65
hurst
-0.64
chel
-0.64
Lon
-0.62
Holland
-0.61
Houses
-0.61
grass
-0.60
pard
-0.59
POSITIVE LOGITS
onial
1.17
ization
1.09
isations
1.08
izations
1.08
izes
1.08
izing
1.06
ized
1.05
ises
0.99
etric
0.97
ulation
0.95
Activations Density 0.070%