INDEX
Explanations
words related to options and choices
New Auto-Interp
Negative Logits
Fifth
-0.39
fifth
-0.37
Five
-0.32
five
-0.32
five
-0.31
äºĶ
-0.30
äºĶ
-0.28
_five
-0.28
-five
-0.27
Five
-0.27
POSITIVE LOGITS
6
0.26
7
0.26
8
0.17
fout
0.16
Ù
0.15
678
0.15
६
0.15
Seven
0.15
[vi
0.15
seven
0.14
Activations Density 0.033%