INDEX
Explanations
people's names with the substring "ins"
New Auto-Interp
Negative Logits
ploma
-0.62
Achieve
-0.60
Governments
-0.59
¥µ
-0.59
Prime
-0.58
Proposition
-0.58
sterdam
-0.57
DISTR
-0.57
shake
-0.56
compr
-0.56
POSITIVE LOGITS
pection
1.24
urance
1.20
kaya
1.04
anity
0.99
poon
0.99
ufficient
0.98
piration
0.98
ensitive
0.97
pect
0.97
piring
0.97
Activations Density 0.024%