INDEX
Explanations
phrases related to consequences or methods
phrases indicating significance or implications of various subjects
New Auto-Interp
Negative Logits
reb
-0.72
edit
-0.65
greg
-0.59
alted
-0.59
older
-0.59
rex
-0.58
Bene
-0.57
rage
-0.56
ersen
-0.56
icent
-0.56
POSITIVE LOGITS
means
3.65
Means
2.62
meant
1.92
mean
1.81
entails
1.53
signifies
1.52
implies
1.50
translates
1.49
equals
1.32
denotes
1.24
Activations Density 0.027%