INDEX
Explanations
terms related to names and titles
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
innocence
-0.72
welf
-0.70
temptation
-0.66
aristocracy
-0.66
knees
-0.65
moderation
-0.64
pasture
-0.64
linen
-0.64
camel
-0.64
millennium
-0.64
POSITIVE LOGITS
INAL
1.06
ICT
0.84
ALLY
0.80
OY
0.72
oti
0.72
ãĥ¼ãĥ³
0.71
oxide
0.70
INO
0.69
xual
0.69
nels
0.69
Activations Density 0.114%