INDEX
Explanations
key terms and phrases that indicate quantitative measurements or assessments
New Auto-Interp
Negative Logits
jah
-0.17
apı
-0.16
kos
-0.16
orges
-0.14
mate
-0.14
_HW
-0.14
вÑĸ
-0.14
_tC
-0.14
uer
-0.14
grieving
-0.13
POSITIVE LOGITS
atron
0.15
atica
0.15
ampo
0.15
amo
0.15
reon
0.15
othermal
0.14
леÑĩ
0.14
XD
0.14
.tt
0.14
och
0.14
Activations Density 0.028%