INDEX
Explanations
phrases that indicate hands-on or practical experiences
New Auto-Interp
Negative Logits
lef
-0.15
Mate
-0.14
odore
-0.14
-hearted
-0.14
loyd
-0.14
nt
-0.14
æį·
-0.14
-song
-0.13
Ones
-0.13
Jennings
-0.13
POSITIVE LOGITS
/down
0.21
approach
0.21
/on
0.20
edly
0.20
ery
0.19
/off
0.18
urance
0.18
Approach
0.17
/full
0.17
appro
0.17
Activations Density 0.198%