INDEX
Explanations
adjective-noun phrases describing divisions or distinctions
instances of the term "ops" and its variations, likely indicative of errors or issues in communication
New Auto-Interp
Negative Logits
SHIP
-0.72
phyl
-0.64
prisoners
-0.62
inmates
-0.61
anesthesia
-0.60
embodiments
-0.59
embr
-0.59
theless
-0.59
jad
-0.59
WAYS
-0.56
POSITIVE LOGITS
ops
1.18
heet
0.96
ilon
0.96
yright
0.95
oppers
0.92
imus
0.92
ided
0.90
iate
0.89
olitan
0.87
icle
0.86
Activations Density 0.024%