INDEX
Explanations
proper nouns or technical terms
references to scientific concepts, terms, or laws
New Auto-Interp
Negative Logits
hesda
-0.73
inclined
-0.71
etheless
-0.66
muted
-0.66
hesitant
-0.65
faculties
-0.65
tempted
-0.64
disarm
-0.64
truthful
-0.64
aples
-0.63
POSITIVE LOGITS
"—
0.76
".
0.75
Kare
0.72
Geh
0.72
",
0.71
*.
0.69
Versus
0.68
Emer
0.66
itis
0.65
Cop
0.65
Activations Density 0.580%