INDEX
Explanations
abbreviations or acronyms in the context of biological or scientific terms
New Auto-Interp
Negative Logits
y
-0.68
s
-0.61
s
-0.60
-
-0.59
o
-0.58
M
-0.57
u
-0.57
h
-0.56
me
-0.56
M
-0.55
POSITIVE LOGITS
myſelf
1.31
Eſ
1.21
pleaſure
1.19
raiſ
1.17
Conſ
1.16
Monfieur
1.16
ſelf
1.16
preſent
1.15
ſeveral
1.15
ſever
1.13
Activations Density 0.962%