INDEX
Explanations
phrases related to judgment or assessment
prepositions indicating time or frequency
New Auto-Interp
Negative Logits
LESS
-0.75
Defeat
-0.73
Archdemon
-0.71
FTWARE
-0.61
Palest
-0.59
NESS
-0.57
idi
-0.56
oranges
-0.54
Rampage
-0.54
Atkinson
-0.53
POSITIVE LOGITS
ived
0.86
stood
0.84
roph
0.80
ached
0.79
occasion
0.79
ev
0.76
versely
0.75
itud
0.74
virt
0.72
alien
0.72
Activations Density 0.081%