INDEX
Explanations
numerical expressions or patterns
New Auto-Interp
Negative Logits
urette
-0.17
laus
-0.17
025
-0.17
lev
-0.16
022
-0.16
024
-0.16
Ŀ
-0.15
624
-0.15
Bruce
-0.14
Bruce
-0.14
POSITIVE LOGITS
34
0.42
35
0.41
33
0.41
36
0.39
37
0.39
32
0.37
38
0.36
Thirty
0.34
31
0.34
39
0.33
Activations Density 0.084%