INDEX
Explanations
phrases indicating comparisons or examples
New Auto-Interp
Negative Logits
istical
-0.83
idate
-0.78
fixed
-0.77
olate
-0.76
gap
-0.76
ettlement
-0.76
nown
-0.75
iminary
-0.75
enser
-0.72
ensibly
-0.72
POSITIVE LOGITS
Louie
0.92
Franz
0.89
Forrest
0.88
Alfred
0.86
Jasper
0.86
Sergio
0.86
Clive
0.86
Clint
0.85
Cowboy
0.85
Leonardo
0.84
Activations Density 0.071%