INDEX
Explanations
proper nouns denoting people, organizations, and locations
New Auto-Interp
Negative Logits
blast
-0.66
llular
-0.64
IFT
-0.63
abs
-0.63
farm
-0.63
igers
-0.62
ql
-0.62
ield
-0.61
pez
-0.61
heights
-0.60
POSITIVE LOGITS
ly
0.82
edly
0.76
Ĥİ
0.71
translation
0.70
eous
0.70
Sources
0.69
ĪĴ
0.69
sources
0.69
Sources
0.68
tained
0.68
Activations Density 0.041%