INDEX
Explanations
occurrences of specific structured phrases and relationships in a text
New Auto-Interp
Negative Logits
ehler
-0.16
ripp
-0.15
.scalablytyped
-0.14
dux
-0.14
zyst
-0.14
ibold
-0.14
renc
-0.14
rana
-0.13
á»ĵn
-0.13
akhir
-0.13
POSITIVE LOGITS
Rebel
0.14
strapon
0.13
886
0.13
443
0.13
ofi
0.12
844
0.12
اÙĦÙī
0.12
disadv
0.12
373
0.12
igo
0.12
Activations Density 0.153%