INDEX
Negative Logits
apologies
-0.68
representation
-0.64
equivalents
-0.58
compromise
-0.57
ambassadors
-0.57
conventions
-0.56
elimination
-0.56
Wonderland
-0.55
pleasure
-0.55
audiences
-0.55
POSITIVE LOGITS
oglu
1.13
onen
1.07
chuk
1.05
hani
1.02
kov
1.02
zinski
1.01
gui
1.01
arov
1.00
iev
0.99
iani
0.97
Activations Density 0.249%