INDEX
Explanations
phrases indicating a significant degree or extent
New Auto-Interp
Negative Logits
oeff
-0.17
hip
-0.17
hist
-0.16
ru
-0.15
atab
-0.15
hape
-0.14
nist
-0.14
dec
-0.14
ÑĮе
-0.14
еÑī
-0.14
POSITIVE LOGITS
-ÑĤаки
0.18
Occurred
0.15
SEA
0.15
alker
0.14
ë°©
0.14
»
0.14
ìĦľëĬĶ
0.13
.dsl
0.13
akers
0.13
(er
0.13
Activations Density 0.041%