INDEX
Explanations
instances of the word "be" followed by other words
statements about potential or hypothetical situations
New Auto-Interp
Negative Logits
tends
-0.63
consumes
-0.62
seldom
-0.62
airy
-0.61
plex
-0.60
often
-0.60
appears
-0.59
ele
-0.59
Kub
-0.58
verend
-0.57
POSITIVE LOGITS
ħĭ
0.98
feas
0.83
mistaken
0.76
conce
0.75
someday
0.74
ivably
0.72
ADRA
0.72
efe
0.70
oÄŁ
0.69
DACA
0.68
Activations Density 0.150%