INDEX
Explanations
references to authors or writers and their respective publications
sentences that indicate professional roles or identities
New Auto-Interp
Negative Logits
stamped
-0.77
onga
-0.76
extinguished
-0.76
trap
-0.68
toast
-0.67
orderly
-0.67
brut
-0.67
chopping
-0.66
alley
-0.66
aceae
-0.63
POSITIVE LOGITS
Previously
1.13
Follow
0.97
0.93
Readers
0.92
Currently
0.90
Its
0.90
Originally
0.89
He
0.87
Visit
0.86
His
0.85
Activations Density 0.167%