INDEX
Explanations
names of individuals
references to individual names and key figures in the context
New Auto-Interp
Negative Logits
usterity
-0.74
onymous
-0.70
Chilean
-0.70
vironment
-0.70
ructure
-0.70
ulhu
-0.69
ainment
-0.67
eral
-0.66
conspicuous
-0.64
alid
-0.63
POSITIVE LOGITS
Jacobs
1.31
tones
0.90
aunders
0.78
mann
0.68
Amon
0.67
inelli
0.67
intrins
0.67
robe
0.66
aign
0.66
hetti
0.66
Activations Density 0.004%