INDEX
Explanations
statements of fact or information in text
references to statements made in articles or reports
New Auto-Interp
Negative Logits
ipal
-0.79
iffe
-0.79
Flavoring
-0.73
Sco
-0.68
atz
-0.66
asi
-0.65
suscept
-0.64
cv
-0.64
rax
-0.63
xtap
-0.63
POSITIVE LOGITS
unequivocally
0.85
goodbye
0.81
rooms
0.78
lessness
0.70
spec
0.68
quo
0.67
less
0.67
emphatically
0.66
pec
0.65
ulate
0.65
Activations Density 0.028%