INDEX
Explanations
references to the word "ms" at different activation levels
New Auto-Interp
Negative Logits
redo
-0.70
Rican
-0.69
OPLE
-0.66
Rico
-0.64
Zoro
-0.63
Strauss
-0.63
acids
-0.62
acid
-0.60
ASED
-0.60
Ram
-0.59
POSITIVE LOGITS
pace
1.06
achus
1.04
erver
1.03
giving
0.98
mble
0.95
manship
0.94
creen
0.92
fw
0.92
ophon
0.90
cript
0.89
Activations Density 0.020%