INDEX
Explanations
phrases containing personal titles such as "Dr." along with names
references to a specific individual or persona
New Auto-Interp
Negative Logits
ulative
-0.55
ependent
-0.55
staking
-0.54
cipled
-0.53
utenberg
-0.50
":[{"-0.50
matical
-0.49
Kejriwal
-0.49
untled
-0.49
varied
-0.48
POSITIVE LOGITS
.).
0.67
.''
0.63
ensis
0.59
*.
0.58
".
0.58
'.
0.57
''.
0.55
iani
0.55
.</
0.55
%.
0.52
Activations Density 2.507%