INDEX
Explanations
nerve, nervous, endings, function, agent
New Auto-Interp
Negative Logits
AMO
0.39
阌
0.39
Learning
0.38
omn
0.38
angnya
0.38
Kerry
0.38
negative
0.38
Negative
0.37
Deferred
0.37
omn
0.37
POSITIVE LOGITS
nerve
0.92
Nerve
0.87
Nerv
0.81
nerv
0.80
nerves
0.75
нер
0.73
endings
0.72
wreck
0.68
nerfs
0.64
nervoso
0.64
Activations Density 0.005%