INDEX
Explanations
texts referring to comparisons or similarities
New Auto-Interp
Negative Logits
ennes
-0.78
Published
-0.77
inas
-0.76
iets
-0.71
hiba
-0.70
inion
-0.68
ione
-0.67
overy
-0.66
anthrop
-0.65
ilic
-0.65
POSITIVE LOGITS
lihood
2.04
liest
1.45
lier
1.37
minded
1.14
minded
1.12
liness
1.07
clock
0.83
ours
0.82
wildfire
0.80
ability
0.78
Activations Density 0.973%