INDEX
Explanations
phrases indicating levels of difficulty or improbability
the phrase "let alone."
New Auto-Interp
Negative Logits
natureconservancy
-0.73
assian
-0.72
cill
-0.69
idian
-0.59
antz
-0.58
ets
-0.56
edge
-0.54
cale
-0.54
encount
-0.54
peria
-0.53
POSITIVE LOGITS
alone
1.20
tered
0.92
tering
0.88
hetically
0.85
ting
0.80
icia
0.77
ingly
0.76
ogether
0.72
ardless
0.70
ãĥĥãĥī
0.67
Activations Density 0.021%