INDEX
Explanations
adjectives and phrases related to attributes or characteristics
key terms related to classifications and identity distinctions
New Auto-Interp
Negative Logits
«
-0.82
�
-0.80
ãĢĮ
-0.77
§§
-0.68
ãĢĮ
-0.66
Berks
-0.66
Administ
-0.59
''
-0.59
âĶĢ
-0.59
().
-0.59
POSITIVE LOGITS
"
1.49
"?
1.42
"!
1.40
",
1.38
"-
1.37
")
1.35
".
1.28
"—
1.27
";
1.25
").
1.25
Activations Density 0.307%