INDEX
Explanations
names of individuals, potentially notable figures in various fields
proper nouns or names of people
New Auto-Interp
Negative Logits
̶
-0.64
..."
-0.63
AAA
-0.63
â̦"
-0.58
AME
-0.58
stall
-0.57
OTAL
-0.56
Roaming
-0.55
.ãĢį
-0.55
NAT
-0.53
POSITIVE LOGITS
enegger
0.92
hower
0.85
anyahu
0.83
testified
0.80
meier
0.78
hner
0.78
igham
0.75
meyer
0.74
joked
0.73
arnaev
0.73
Activations Density 0.282%