INDEX
Explanations
instances of punctuation marks at the end of sentences
instances of classification or categorization terminology
New Auto-Interp
Negative Logits
istor
-0.67
ÃĥÃĤ
-0.60
tradem
-0.60
(),
-0.59
incarn
-0.57
eleph
-0.56
ovo
-0.54
ÃĥÃĤÃĥÃĤ
-0.53
incarn
-0.53
());
-0.53
POSITIVE LOGITS
[
2.69
[
2.04
["
2.01
[/
1.94
[-
1.90
[/
1.81
['
1.77
[(
1.71
[]
1.70
[*
1.70
Activations Density 0.140%