INDEX
Explanations
references to the concept of "objectivity" and related philosophical terms
New Auto-Interp
Negative Logits
aber
-0.16
izu
-0.15
uen
-0.15
avigator
-0.15
agra
-0.14
ackers
-0.14
fm
-0.14
inston
-0.14
iden
-0.14
itational
-0.14
POSITIVE LOGITS
ively
0.26
ors
0.18
alist
0.18
hood
0.17
ivity
0.16
ives
0.16
ually
0.16
andalone
0.15
ponge
0.15
Revolutionary
0.15
Activations Density 0.070%