INDEX
Explanations
references to specific sources or citations within a text
citations or references to sources and studies
New Auto-Interp
Negative Logits
apon
-0.80
soever
-0.74
ieu
-0.66
Sphere
-0.61
pires
-0.60
UME
-0.59
Enter
-0.59
Females
-0.58
Veg
-0.57
Fuck
-0.57
POSITIVE LOGITS
precedent
0.89
preced
0.89
similarities
0.85
playbook
0.85
Cosponsors
0.84
sources
0.80
anecdotal
0.78
example
0.77
inaccur
0.77
firsthand
0.75
Activations Density 0.430%