INDEX
Explanations
references to influential literary figures and their works
New Auto-Interp
Negative Logits
194
-0.20
Hitler
-0.19
Broadcast
-0.18
jeep
-0.17
atica
-0.17
WWII
-0.17
ï¼ĪæĺŃåĴĮ
-0.16
garage
-0.16
eza
-0.16
193
-0.16
POSITIVE LOGITS
Victorian
0.61
184
0.56
185
0.56
183
0.56
182
0.53
186
0.51
181
0.45
187
0.43
nineteenth
0.43
Victor
0.35
Activations Density 0.320%