INDEX
Explanations
phrases indicating a minimum acceptable level or standard
expressions that minimize or downplay situations
New Auto-Interp
Negative Logits
clerosis
-0.77
kefeller
-0.72
axter
-0.69
Berk
-0.67
Codes
-0.65
Travels
-0.64
Lyndon
-0.63
Blueprint
-0.62
iannopoulos
-0.61
thumbnails
-0.60
POSITIVE LOGITS
conceivable
0.72
recogn
0.69
possible
0.69
favourable
0.69
plausible
0.66
amount
0.65
uner
0.64
imaginable
0.64
egu
0.64
practicable
0.62
Activations Density 0.029%