INDEX
Explanations
specific numerical patterns or values, particularly in the context of structures or examples
New Auto-Interp
Negative Logits
9
-0.17
3
-0.15
4
-0.15
ĺìĿ´
-0.14
8
-0.14
7
-0.14
6
-0.13
väl
-0.13
hang
-0.13
persever
-0.13
POSITIVE LOGITS
01
0.57
02
0.57
04
0.57
03
0.56
05
0.56
06
0.56
07
0.54
08
0.52
09
0.50
00
0.37
Activations Density 0.136%