INDEX
Explanations
instances of phrases or expressions that refer to sets of items or categories
New Auto-Interp
Negative Logits
535
-0.18
pedia
-0.17
etik
-0.15
ONGL
-0.15
.metamodel
-0.14
ologia
-0.14
ongs
-0.14
ologically
-0.14
ior
-0.14
oui
-0.14
POSITIVE LOGITS
ï¸ı
0.16
efon
0.15
illin
0.14
amilia
0.14
etine
0.14
(s
0.14
erb
0.14
поÑĢ
0.13
á»ĵi
0.13
ames
0.13
Activations Density 0.010%