INDEX
Explanations
numbers in a specific format
repeated patterns of punctuation or numerical sequences
New Auto-Interp
Negative Logits
uggest
-0.65
ngth
-0.65
melanch
-0.64
fantasies
-0.58
reconc
-0.58
gad
-0.58
vag
-0.55
Cros
-0.55
ioned
-0.55
ashtra
-0.54
POSITIVE LOGITS
000
1.13
00
0.98
9
0.97
0
0.93
8
0.91
6
0.90
048
0.89
12
0.88
600
0.88
09
0.86
Activations Density 0.110%