INDEX
Explanations
numerical values
numerical representations or identifiers associated with variables and figures
New Auto-Interp
Negative Logits
OFF
-0.88
GOODMAN
-0.79
COM
-0.78
MAY
-0.74
FOX
-0.69
MUST
-0.68
COVER
-0.68
SHOULD
-0.67
IPS
-0.66
RAW
-0.65
POSITIVE LOGITS
b
1.47
n
1.45
d
1.43
h
1.41
f
1.40
s
1.39
p
1.37
l
1.34
r
1.34
t
1.34
Activations Density 0.253%