INDEX
Negative Logits
ses
-0.17
rador
-0.12
achusetts
-0.11
odore
-0.11
pired
-0.11
woke
-0.11
teenth
-0.11
ductive
-0.11
levard
-0.11
wealth
-0.10
POSITIVE LOGITS
oret
0.22
orem
0.22
ories
0.20
owing
0.19
existent
0.17
oretical
0.16
linear
0.16
teen
0.16
neath
0.16
iquement
0.16
Activations Density 0.284%