INDEX
Explanations
connections or relationships in scientific or technical contexts
New Auto-Interp
Negative Logits
S
-1.03
J
-0.98
T
-0.94
o
-0.93
N
-0.92
S
-0.92
K
-0.91
D
-0.90
P
-0.89
E
-0.89
POSITIVE LOGITS
themſelves
1.63
myſelf
1.58
itſelf
1.55
pleaſure
1.48
ſelves
1.46
་་
1.46
Theſe
1.44
Monfieur
1.44
Efq
1.41
―――――
1.41
Activations Density 1.024%