INDEX
Explanations
terms related to lifestyle and healthcare
New Auto-Interp
Negative Logits
J
-0.84
S
-0.81
e
-0.78
o
-0.76
O
-0.76
in
-0.76
T
-0.75
i
-0.75
E
-0.73
-0.73
POSITIVE LOGITS
myſelf
1.53
pleaſure
1.49
itſelf
1.47
Theſe
1.44
Diſ
1.43
Efq
1.41
Anſ
1.41
viſ
1.40
themſelves
1.39
ſche
1.38
Activations Density 0.365%