INDEX
Explanations
cone-related phrases or words
references to cones and their various contexts
New Auto-Interp
Negative Logits
emb
-0.80
conscious
-0.80
ulators
-0.80
ulator
-0.75
ulative
-0.72
ulates
-0.72
oken
-0.71
ulated
-0.69
Feet
-0.67
athered
-0.67
POSITIVE LOGITS
cone
1.01
xon
0.97
cones
0.93
Sentinel
0.88
utic
0.82
Canaveral
0.75
hower
0.69
Coul
0.66
slopes
0.66
applic
0.66
Activations Density 0.028%