INDEX
Explanations
names of historical figures or events related to monarchies
New Auto-Interp
Negative Logits
189
-0.17
INTERRUPTION
-0.17
190
-0.17
ãĥ³ãĤ¬
-0.16
idunt
-0.16
Alfred
-0.16
191
-0.16
194
-0.15
pian
-0.15
187
-0.15
POSITIVE LOGITS
161
0.41
162
0.40
164
0.40
166
0.40
163
0.40
159
0.39
165
0.38
167
0.36
160
0.35
169
0.34
Activations Density 0.310%