INDEX
Explanations
references to quantities and proportions
New Auto-Interp
Negative Logits
REW
-0.16
ãĥ¼ãĥĦ
-0.15
assel
-0.14
inic
-0.14
aper
-0.14
agas
-0.14
543
-0.14
iores
-0.13
æŀ¶
-0.13
fine
-0.13
POSITIVE LOGITS
third
0.63
third
0.59
THIRD
0.55
Third
0.54
fifth
0.54
Third
0.52
-third
0.52
第ä¸ī
0.49
fourth
0.49
_third
0.46
Activations Density 0.051%