INDEX
Explanations
computer-generated gibberish and scientific terms.
text related to systems and comparisons
technical documents
New Auto-Interp
Negative Logits
val
-0.56
Lu
-0.52
She
-0.51
bidden
-0.51
i
-0.51
gu
-0.49
lain
-0.48
vid
-0.48
Ter
-0.48
Goodwin
-0.47
POSITIVE LOGITS
itſelf
1.00
ſelf
0.96
ſelves
0.96
myſelf
0.93
deſt
0.91
pleaſure
0.91
ſtate
0.89
Monfieur
0.88
whoſe
0.88
preſent
0.88
Activations Density 2.817%