INDEX
Explanations
references to evolution
the repetition of the token "ev" at varying activation levels
New Auto-Interp
Negative Logits
Whitman
-0.86
matically
-0.62
corpor
-0.62
mature
-0.61
TRY
-0.61
actionGroup
-0.61
hitter
-0.61
common
-0.60
-0.60
Bermuda
-0.60
POSITIVE LOGITS
apor
1.15
itability
1.14
olution
1.10
ev
1.10
iev
1.09
olve
1.07
rov
1.07
idently
1.05
ice
1.02
idential
1.01
Activations Density 0.005%