INDEX
Explanations
proper nouns
end-of-text markers
New Auto-Interp
Negative Logits
concess
-0.69
scrut
-0.66
iden
-0.61
ŃĶ
-0.61
ĺħ
-0.60
isse
-0.60
igslist
-0.59
bowel
-0.58
opsis
-0.57
pill
-0.57
POSITIVE LOGITS
vernment
1.08
iants
0.96
glers
0.96
roups
0.90
ORGE
0.89
raphic
0.87
hetto
0.83
rets
0.83
irlfriend
0.82
stones
0.82
Activations Density 0.128%