INDEX
Explanations
terms related to societal structures or health and dietary factors
New Auto-Interp
Negative Logits
J
-1.02
S
-0.99
O
-0.97
Z
-0.93
K
-0.92
E
-0.92
T
-0.92
o
-0.91
O
-0.90
she
-0.88
POSITIVE LOGITS
myſelf
1.68
themſelves
1.65
Theſe
1.61
itſelf
1.59
purpoſe
1.55
Monfieur
1.55
pleaſure
1.53
Efq
1.48
faſt
1.47
ſeveral
1.45
Activations Density 0.472%