INDEX
Explanations
references to a specific place or person named "Pers."
references to specific individuals or entities
New Auto-Interp
Negative Logits
nz
-0.76
akespe
-0.74
llah
-0.71
BILL
-0.70
iannopoulos
-0.68
ctors
-0.67
Leod
-0.67
masters
-0.66
aneers
-0.66
yrinth
-0.65
POSITIVE LOGITS
istence
1.17
istent
1.09
iments
0.93
pect
0.91
idence
0.88
erver
0.88
Pers
0.87
ipher
0.84
ervation
0.84
hing
0.80
Activations Density 0.011%