INDEX
Explanations
words related to physical structures or components
specific nouns related to tangible objects and structures
New Auto-Interp
Negative Logits
\\\\\\\\
-0.89
atalie
-0.59
DonaldTrump
-0.59
Motion
-0.58
Ventures
-0.58
Recomm
-0.58
ALLY
-0.57
Attempts
-0.57
-+-+
-0.57
Disorder
-0.57
POSITIVE LOGITS
themselves
1.23
cape
1.21
etter
1.14
etting
1.10
mith
1.07
heet
1.05
linger
1.05
hip
1.02
hift
0.98
pace
0.96
Activations Density 0.445%