INDEX
Explanations
occurrences of varying degrees of formality in language
New Auto-Interp
Negative Logits
ViewFeatures
-0.78
mukana
-0.76
_]
-0.71
riman
-0.70
Rodrig
-0.70
Kaufman
-0.69
diğini
-0.68
льки
-0.68
Kaufmann
-0.68
exa
-0.67
POSITIVE LOGITS
Maat
1.04
Aad
0.96
EEC
0.94
Steen
0.92
Maas
0.89
Muir
0.88
Aad
0.88
Baal
0.86
Neel
0.86
oon
0.84
Activations Density 0.159%