INDEX
Explanations
names or terms related to historical figures or events
references to historical terms and figures related to the German Reich
New Auto-Interp
Negative Logits
neau
-0.70
glers
-0.68
plates
-0.66
gdala
-0.66
BILITY
-0.65
ABLE
-0.65
ulton
-0.64
ctica
-0.64
ocre
-0.64
Tibetan
-0.62
POSITIVE LOGITS
stein
0.87
sb
0.85
hardt
0.85
lich
0.84
Reich
0.84
swer
0.84
enegger
0.79
sle
0.76
itzer
0.76
decree
0.75
Activations Density 0.025%