INDEX
Explanations
references to temperature or thermodynamics
New Auto-Interp
Negative Logits
Theſe
-1.17
themſelves
-0.98
Beſ
-0.96
ſelves
-0.96
ſelf
-0.94
ſever
-0.94
ſel
-0.91
ſeveral
-0.91
―――――
-0.90
Anſ
-0.89
POSITIVE LOGITS
T
2.08
T
1.80
t
1.71
getT
1.31
pT
1.08
M
1.04
cT
1.03
S
1.03
mT
1.00
Т
1.00
Activations Density 0.176%