INDEX
Explanations
words related to health, illness, sobriety, and being awake or asleep
terms related to health issues and conditions
New Auto-Interp
Negative Logits
cascade
-0.61
Empire
-0.61
ILA
-0.60
awards
-0.59
unexpl
-0.57
opacity
-0.57
similarity
-0.56
reconc
-0.55
Flavoring
-0.55
fasc
-0.55
POSITIVE LOGITS
enough
1.08
enough
1.06
Enough
0.85
footed
0.84
again
0.83
abouts
0.81
asleep
0.78
ridden
0.77
ired
0.77
wired
0.77
Activations Density 0.303%