INDEX
Explanations
proper nouns and their associated identifiers
New Auto-Interp
Negative Logits
spre
-0.35
Jej
-0.35
esc
-0.33
挑
-0.32
Sprung
-0.32
цов
-0.32
Clik
-0.32
spr
-0.31
Faire
-0.31
cag
-0.31
POSITIVE LOGITS
ND
1.46
NT
1.45
NB
1.36
NC
1.36
NF
1.35
NR
1.34
NM
1.34
NP
1.33
NH
1.33
nt
1.33
Activations Density 0.505%