INDEX
Explanations
proper nouns
different forms of verbs or verb endings
New Auto-Interp
Negative Logits
ï¸ı
-0.80
Janeiro
-0.78
Siren
-0.74
ģ«
-0.74
plun
-0.72
Trend
-0.69
Unix
-0.67
ï¸
-0.67
Priv
-0.64
Azerbai
-0.63
POSITIVE LOGITS
eston
0.77
hof
0.75
igans
0.74
aum
0.73
eday
0.73
loe
0.70
AFB
0.69
jee
0.68
»Ĵ
0.67
oe
0.66
Activations Density 0.216%