INDEX
Explanations
proper nouns, specifically names
the presence of the name "Sab" in various contexts
New Auto-Interp
Negative Logits
Hawaiian
-0.69
fung
-0.64
人
-0.62
omission
-0.61
Underground
-0.59
tru
-0.58
Fargo
-0.58
SPONSORED
-0.58
ï¸ı
-0.57
killer
-0.57
POSITIVE LOGITS
rina
1.33
ģ
0.99
arat
0.98
qua
0.98
onis
0.96
eway
0.96
ril
0.91
eways
0.90
riel
0.89
aram
0.89
Activations Density 0.042%