INDEX
Explanations
years and centuries
historical dates and events
New Auto-Interp
Negative Logits
oppable
-0.82
malink
-0.75
arty
-0.75
grass
-0.72
ynthesis
-0.66
gment
-0.65
ynchronous
-0.65
unal
-0.62
treadmill
-0.61
insky
-0.60
POSITIVE LOGITS
BCE
1.66
BC
1.51
CE
1.50
AD
1.42
AD
1.35
BC
1.33
AH
1.15
bc
1.09
BB
1.03
AE
1.01
Activations Density 0.062%