INDEX
Explanations
terms related to shallowness or superficiality
New Auto-Interp
Negative Logits
CLASSIFIED
-0.75
OX
-0.70
starring
-0.70
haw
-0.69
signed
-0.68
Revolution
-0.68
unknown
-0.67
FORE
-0.66
HCR
-0.66
Chronicles
-0.63
POSITIVE LOGITS
est
1.03
ly
1.01
(<
0.90
ened
0.88
shallow
0.88
ening
0.87
slope
0.83
depth
0.81
itudinal
0.81
ness
0.80
Activations Density 0.004%