INDEX
Explanations
names of individuals, possibly related to different contexts
New Auto-Interp
Negative Logits
ivia
-0.86
urity
-0.81
berus
-0.78
gdala
-0.77
ãĥĪ
-0.77
ctions
-0.77
ctory
-0.75
ãĥ¤
-0.75
alogue
-0.75
ãĥ¬
-0.74
POSITIVE LOGITS
robe
1.70
ynski
0.96
ens
0.93
ages
0.85
hips
0.83
ings
0.82
ell
0.82
ling
0.80
age
0.79
lock
0.79
Activations Density 0.028%