INDEX
Explanations
statements providing explanations or descriptions of concepts or observations
scholarly assertions and citations
New Auto-Interp
Negative Logits
SHARE
-0.69
wives
-0.64
condol
-0.64
Cele
-0.61
fired
-0.59
ranged
-0.58
luck
-0.58
clicked
-0.58
naissance
-0.58
thumbnails
-0.57
POSITIVE LOGITS
convinc
0.94
extensively
0.85
empir
0.82
rhet
0.81
quotes
0.81
"â̦
0.81
anecdotes
0.78
explicitly
0.77
quoting
0.76
"...
0.75
Activations Density 0.303%