INDEX
Explanations
digits or numbers embedded in text
specific identifiers and codes, likely related to a classification or categorization system
New Auto-Interp
Negative Logits
OFF
-0.80
metic
-0.75
COM
-0.71
GREEN
-0.68
FOX
-0.66
ALE
-0.66
HA
-0.66
EDITION
-0.66
DIS
-0.65
MUST
-0.64
POSITIVE LOGITS
s
1.43
n
1.43
b
1.42
d
1.41
f
1.41
p
1.39
h
1.35
t
1.34
l
1.33
c
1.32
Activations Density 0.291%