INDEX
Explanations
mentions of people and their roles or actions in historical contexts
New Auto-Interp
Negative Logits
antan
-0.09
bstract
-0.08
endment
-0.08
utzer
-0.08
ÃŃst
-0.08
orgen
-0.08
fcn
-0.08
ÙĪÙĪ
-0.08
getti
-0.08
olon
-0.07
POSITIVE LOGITS
rec
0.07
res
0.06
dipl
0.06
intra
0.06
minor
0.06
fe
0.06
correspond
0.06
128
0.05
ani
0.05
pro
0.05
Activations Density 0.001%