INDEX
Explanations
phrases related to imitation, learning, and instruction
conjunctions and their association with various actions or states
New Auto-Interp
Negative Logits
ledged
-0.77
scribe
-0.75
arton
-0.71
pedia
-0.68
aceae
-0.66
rued
-0.66
igun
-0.66
lated
-0.65
cised
-0.65
ize
-0.65
POSITIVE LOGITS
letting
1.72
putting
1.68
making
1.62
delivering
1.61
creating
1.60
adapting
1.58
gaining
1.58
distributing
1.58
getting
1.57
discovering
1.57
Activations Density 0.352%