INDEX
Negative Logits
theless
-0.79
Halls
-0.64
resemblance
-0.61
Burg
-0.57
Mock
-0.56
Heritage
-0.55
bachelor
-0.55
Purg
-0.53
Freem
-0.52
redundancy
-0.52
POSITIVE LOGITS
oths
1.36
apy
1.22
iled
1.11
bered
1.10
othe
1.08
aps
1.07
othes
1.07
bs
0.98
far
0.98
oooo
0.97
Activations Density 0.060%