INDEX
Explanations
phrases related to laws, policies, titles, or positions
New Auto-Interp
Negative Logits
Tasman
-0.65
creen
-0.64
erous
-0.64
Reprodu
-0.63
yip
-0.63
Leilan
-0.62
eer
-0.62
Catalyst
-0.61
Laurent
-0.61
Telephone
-0.60
POSITIVE LOGITS
IJ
1.44
ij
1.40
Ĵ
1.36
ł
1.30
ª
1.27
ı
1.25
ĸ
1.17
Ķ
1.12
ĵ
1.09
¹
1.09
Activations Density 0.109%