INDEX
Explanations
the word "mo" with high activation values
New Auto-Interp
Negative Logits
Scand
-0.64
emort
-0.59
circles
-0.58
©¶æ
-0.55
Sutherland
-0.54
utenberg
-0.52
quarters
-0.52
lodge
-0.52
hospital
-0.51
pg
-0.51
POSITIVE LOGITS
ighed
0.94
oused
0.92
ousing
0.89
ciating
0.86
asion
0.86
lement
0.84
ufact
0.84
ishment
0.83
uates
0.82
itably
0.82
Activations Density 0.064%