INDEX
Explanations
proper names of people
New Auto-Interp
Negative Logits
awaru
-0.78
TOR
-0.76
exempt
-0.74
asks
-0.73
PLA
-0.72
regulated
-0.72
wrapper
-0.72
yrim
-0.71
leaders
-0.71
sites
-0.71
POSITIVE LOGITS
Moore
1.04
Allan
1.01
Mun
1.01
Griffith
0.99
Graham
0.99
Cohen
0.99
Herbert
0.99
Miller
0.98
Hayes
0.97
Lewis
0.97
Activations Density 0.768%