INDEX
Explanations
abbreviated scientific terminology and acronyms
New Auto-Interp
Negative Logits
the
-0.65
theore
-0.63
۔
-0.61
tradition
-0.60
tafel
-0.53
raiſ
-0.53
myſelf
-0.53
they
-0.53
deſt
-0.52
ization
-0.52
POSITIVE LOGITS
sies
0.45
hes
0.44
ses
0.44
ais
0.43
rs
0.43
r
0.42
dos
0.42
tis
0.41
es
0.41
tic
0.41
Activations Density 1.247%