INDEX
Explanations
phrases related to legal and political contexts
special characters or unique symbols
New Auto-Interp
Negative Logits
Dupl
-0.72
wagen
-0.70
Farn
-0.68
Kitt
-0.65
izabeth
-0.65
sacrific
-0.65
conduc
-0.65
destro
-0.63
Eisen
-0.63
itaire
-0.62
POSITIVE LOGITS
ª
1.13
Ĵ
1.09
¹
0.99
ł
0.97
IJ
0.96
ij
0.95
¼
0.92
ı
0.91
³
0.90
Ĥ
0.90
Activations Density 0.103%