INDEX
Explanations
words related to unusual or abnormal situations
New Auto-Interp
Negative Logits
noinspection
-0.16
bes
-0.14
inous
-0.14
ted
-0.14
thed
-0.14
imid
-0.14
mund
-0.14
mean
-0.13
andas
-0.13
isans
-0.13
POSITIVE LOGITS
ities
0.27
-shaped
0.23
itics
0.20
ly
0.20
-looking
0.19
ball
0.19
ity
0.19
-ball
0.19
discrepan
0.19
iti
0.18
Activations Density 0.063%