INDEX
Explanations
the word "arm" with a high level of activation
instances of the word "arm."
New Auto-Interp
Negative Logits
ween
-0.71
Vide
-0.70
Forsaken
-0.64
flush
-0.62
moot
-0.61
payday
-0.60
elig
-0.59
LOAD
-0.59
Skinner
-0.59
Atlantis
-0.58
POSITIVE LOGITS
ageddon
1.41
aceutical
1.12
ament
1.08
onica
0.98
atures
0.97
ony
0.97
ichael
0.93
illary
0.93
aments
0.92
achine
0.92
Activations Density 0.009%