INDEX
Explanations
instances of self-reflection and personal insights
New Auto-Interp
Negative Logits
Gad
-0.66
antitrust
-0.62
Canaver
-0.60
icist
-0.60
optics
-0.58
extant
-0.58
Rockefeller
-0.57
Plaint
-0.57
Millennium
-0.57
Consortium
-0.56
POSITIVE LOGITS
'm
1.65
've
1.35
'll
1.22
am
1.15
'd
1.12
guess
1.06
owe
1.02
adore
1.02
swear
1.02
don
1.01
Activations Density 0.266%