INDEX
Explanations
repeated usage of the verb "do" in various contexts
New Auto-Interp
Negative Logits
appet
-0.68
dynam
-0.67
stead
-0.64
replication
-0.64
stimul
-0.64
demolition
-0.60
sleeper
-0.60
tranquil
-0.59
syn
-0.59
reson
-0.58
POSITIVE LOGITS
Malf
0.83
ammy
0.81
ffe
0.72
orf
0.72
dc
0.69
pose
0.68
FFER
0.67
rant
0.67
ented
0.66
ensed
0.65
Activations Density 0.054%