INDEX
Explanations
phrases that indicate fairness or appropriateness in various contexts
New Auto-Interp
Negative Logits
sworth
-0.17
maal
-0.16
Beled
-0.15
roduction
-0.15
endale
-0.14
omer
-0.14
ngen
-0.14
(Source
-0.14
ाà¤Ĭ
-0.14
ache
-0.14
POSITIVE LOGITS
ably
0.24
-sized
0.24
ately
0.21
sized
0.20
hof
0.18
-priced
0.17
decent
0.17
fair
0.16
Sized
0.16
οι
0.15
Activations Density 0.066%