INDEX
Explanations
punctuation marks and their patterns
New Auto-Interp
Negative Logits
forme
-0.15
isContained
-0.14
nze
-0.14
97
-0.14
Å£i
-0.13
lr
-0.13
иÑĤом
-0.13
ture
-0.13
ibar
-0.13
ipzig
-0.13
POSITIVE LOGITS
201
0.30
200
0.28
202
0.24
199
0.22
late
0.21
198
0.19
197
0.18
000
0.18
late
0.17
mid
0.16
Activations Density 0.024%