INDEX
Explanations
historical and philosophical terms, potentially related to a specific period or individual in history
New Auto-Interp
Negative Logits
Carbuncle
-0.78
DOWN
-0.77
Nadu
-0.72
Narr
-0.68
eleph
-0.67
GEAR
-0.66
Columbia
-0.65
worthy
-0.64
Dwell
-0.63
uyomi
-0.62
POSITIVE LOGITS
vered
1.09
cking
1.08
lder
1.06
elin
1.04
els
1.03
ulner
1.01
ck
1.00
eling
0.96
nder
0.95
clair
0.94
Activations Density 6.736%