INDEX
Explanations
references to authors or writers
mentions of authorship or attribution in text
New Auto-Interp
Negative Logits
bia
-0.88
MpServer
-0.80
asy
-0.76
iasm
-0.76
aser
-0.75
pect
-0.74
vous
-0.73
ounter
-0.72
apor
-0.71
ouple
-0.71
POSITIVE LOGITS
virtue
0.92
Richard
0.78
Wizards
0.78
Warren
0.77
Hasan
0.77
Michele
0.74
Robert
0.74
Hug
0.74
Rod
0.74
Juan
0.74
Activations Density 0.089%