INDEX
Explanations
proper names of people or entities
New Auto-Interp
Negative Logits
Laos
-0.70
á¹
-0.66
irlf
-0.65
Proposition
-0.63
ãĥŁ
-0.61
ruary
-0.61
Tea
-0.61
Guinea
-0.59
Nicaragua
-0.59
issance
-0.58
POSITIVE LOGITS
antage
0.72
elson
0.70
arkin
0.67
owitz
0.65
iel
0.63
kowski
0.63
edin
0.62
taker
0.61
elsen
0.60
agu
0.60
Activations Density 0.054%