INDEX
Explanations
negations or phrases expressing the absence or rejection of something
New Auto-Interp
Negative Logits
ans
-0.17
lings
-0.14
gow
-0.14
opot
-0.14
cul
-0.14
tra
-0.14
encers
-0.14
findAll
-0.14
вÑģего
-0.14
ndon
-0.13
POSITIVE LOGITS
amount
0.31
matter
0.26
amount
0.25
Amount
0.22
aspect
0.21
one
0.21
single
0.20
doubt
0.19
things
0.19
Amount
0.19
Activations Density 0.046%