INDEX
Explanations
proper nouns, specifically names
New Auto-Interp
Negative Logits
GO
-0.15
enes
-0.15
weg
-0.14
uka
-0.14
á»ĵng
-0.14
Feinstein
-0.14
amm
-0.14
ɵ
-0.14
sterol
-0.13
jon
-0.13
POSITIVE LOGITS
oslav
0.17
èĻİ
0.14
vester
0.14
tee
0.14
æŃ
0.14
inox
0.14
minecraft
0.14
Pel
0.14
"value
0.14
458
0.13
Activations Density 0.009%