INDEX
Explanations
proper nouns and names associated with individuals or locations
New Auto-Interp
Negative Logits
BAD
-0.17
uc
-0.14
tub
-0.14
odp
-0.14
ä¼ı
-0.14
-BEGIN
-0.14
æ°ĹãģĮ
-0.13
ikipedia
-0.13
Aws
-0.13
zo
-0.13
POSITIVE LOGITS
Ab
0.22
ab
0.21
querque
0.19
-ab
0.18
ance
0.18
afia
0.17
enant
0.17
AB
0.17
.Ab
0.16
Ab
0.16
Activations Density 0.037%