INDEX
Negative Logits
natureconservancy
-0.70
MODE
-0.69
cumbers
-0.60
Zen
-0.59
omething
-0.58
agre
-0.58
BLE
-0.57
ombat
-0.55
iege
-0.55
ource
-0.55
POSITIVE LOGITS
tered
1.10
icia
1.00
itia
1.00
ting
0.97
tering
0.92
downs
0.78
hetically
0.78
ugal
0.75
inous
0.71
ingly
0.71
Activations Density 0.025%