INDEX
Explanations
parts of language pertaining to categorization and classification
New Auto-Interp
Negative Logits
bershka
-1.10
Monfieur
-1.05
itſelf
-1.00
Theſe
-1.00
Shakspeare
-0.97
myſelf
-0.95
raiſ
-0.94
uſed
-0.94
Efq
-0.93
moschino
-0.92
POSITIVE LOGITS
ber
0.54
said
0.49
TagHelper
0.48
0.46
em
0.46
di
0.46
to
0.45
’
0.43
for
0.43
the
0.42
Activations Density 0.102%