INDEX
Explanations
words containing the substring "ac" with varying activation strengths
prefixes that indicate action or occurrence
New Auto-Interp
Negative Logits
SHIP
-0.80
shorth
-0.77
Piper
-0.76
Spa
-0.74
è¦ļéĨĴ
-0.73
Shots
-0.72
©¶æ¥µ
-0.71
Painter
-0.70
Christensen
-0.70
POW
-0.68
POSITIVE LOGITS
ception
1.20
prise
1.05
istant
1.04
ertain
1.01
pect
1.00
mosp
0.97
vance
0.95
ploy
0.94
ighty
0.93
usterity
0.93
Activations Density 0.111%