INDEX
Explanations
specific nouns describing abstract concepts or physical objects
references to various categories, groups, or collections of things
New Auto-Interp
Negative Logits
Doors
-0.50
lobb
-0.45
··
-0.44
invites
-0.44
Budd
-0.43
Advocate
-0.43
behalf
-0.43
abad
-0.43
Eng
-0.42
odore
-0.42
POSITIVE LOGITS
of
1.03
luster
0.79
Of
0.79
Of
0.77
icularly
0.76
thereof
0.75
OF
0.69
aditional
0.69
idable
0.66
atical
0.64
Activations Density 0.433%