INDEX
Explanations
occurrences of specific names or titles, particularly those beginning with "Gi."
New Auto-Interp
Negative Logits
ahoo
-0.16
ofire
-0.16
ington
-0.15
erman
-0.15
argo
-0.15
Ñĥ
-0.15
ansom
-0.14
al
-0.14
agen
-0.14
ersed
-0.14
POSITIVE LOGITS
'gc
0.16
ippi
0.15
žel
0.15
격
0.15
_RM
0.15
isay
0.14
CRY
0.14
curacy
0.14
à¸Ļส
0.14
sublicense
0.14
Activations Density 0.014%