INDEX
Explanations
action verbs related to negative activities or events
actions associated with systemic disruption or control
New Auto-Interp
Negative Logits
grouping
-0.69
nton
-0.69
ijk
-0.68
say
-0.61
peria
-0.61
eka
-0.60
zip
-0.59
Discussion
-0.59
confir
-0.58
fitting
-0.58
POSITIVE LOGITS
GGGGGGGG
0.73
lass
0.73
redients
0.72
ducks
0.71
Edge
0.70
edIn
0.68
Squid
0.64
rods
0.63
edge
0.63
pige
0.62
Activations Density 0.353%