INDEX
Explanations
specific proper nouns and titles, particularly in scientific and historical contexts
proper nouns and technical identifiers
New Auto-Interp
Negative Logits
↵↵
-0.33
peran
-0.31
ويكيپيديا
-0.30
vainilla
-0.29
manteniendo
-0.27
kentang
-0.27
Còn
-0.27
prefier
-0.27
sure
-0.26
cuanto
-0.26
POSITIVE LOGITS
ſelf
0.77
ſche
0.77
featureID
0.76
ſch
0.76
ſta
0.75
ItemBackground
0.73
ſchaft
0.73
Houſe
0.72
―――――
0.72
Majefty
0.71
Activations Density 1.714%