INDEX
Explanations
references to pubs or pubs-related terms
mentions of pubs
New Auto-Interp
Negative Logits
FUL
-0.82
IRD
-0.79
OHN
-0.78
llor
-0.76
EFF
-0.73
Gaia
-0.72
xual
-0.71
GOODMAN
-0.70
Isles
-0.68
ylum
-0.66
POSITIVE LOGITS
lique
1.40
lisher
1.38
lishing
1.37
lishes
1.36
lish
1.35
escent
1.22
lik
1.08
bing
1.06
bed
0.97
keys
0.90
Activations Density 0.015%