INDEX
Explanations
mentions of prominent individuals, particularly with the name "Hilary"
New Auto-Interp
Negative Logits
ies
-0.16
ocrat
-0.15
itr
-0.15
_ABS
-0.14
istan
-0.14
ipl
-0.14
raj
-0.14
stras
-0.14
itat
-0.14
inch
-0.14
POSITIVE LOGITS
ary
0.24
bert
0.24
Hil
0.22
ario
0.22
ARIO
0.21
ights
0.20
ário
0.18
BERT
0.18
ARY
0.18
fsp
0.18
Activations Density 0.012%