INDEX
Explanations
phrases with the prefix "le-" followed by numbers
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.85
����
-0.74
aneers
-0.69
âĶģ
-0.68
ĸļ
-0.66
ãģĨ
-0.65
ilities
-0.64
ãĥ¼ãĥĨ
-0.63
acca
-0.63
ruary
-0.62
POSITIVE LOGITS
opard
1.23
isure
1.06
icester
1.00
vered
1.00
vity
1.00
pid
0.99
gged
0.99
gging
0.98
yton
0.98
mons
0.97
Activations Density 0.018%