INDEX
Explanations
terms related to size or comparison in context
New Auto-Interp
Negative Logits
T
-0.71
mit
-0.69
-0.66
Mat
-0.63
Z
-0.61
j
-0.61
k
-0.61
Mil
-0.60
f
-0.60
Bad
-0.58
POSITIVE LOGITS
leſs
1.26
myſelf
1.25
itſelf
1.17
himſelf
1.16
theless
1.15
ſelf
1.14
Anſ
1.11
themſelves
1.06
reaſon
1.06
wiſe
1.06
Activations Density 0.124%