INDEX
Explanations
patterns of repetition and frequency in various contexts
New Auto-Interp
Negative Logits
First
-0.24
First
-0.19
ricks
-0.17
nearest
-0.17
é¦ĸ
-0.15
oda
-0.15
closest
-0.15
lyn
-0.15
FIRST
-0.14
ÑĢик
-0.14
POSITIVE LOGITS
third
0.60
fourth
0.59
fifth
0.56
sixth
0.54
third
0.52
second
0.49
THIRD
0.48
seventh
0.48
第ä¸ī
0.46
eighth
0.44
Activations Density 0.115%