INDEX
Explanations
phrases indicating the presence or implementation of established procedures or systems
New Auto-Interp
Negative Logits
gee
-0.76
etti
-0.72
anche
-0.71
yssey
-0.70
odge
-0.67
DRAG
-0.66
zzy
-0.66
ourge
-0.65
Ging
-0.65
zees
-0.64
POSITIVE LOGITS
whereby
0.83
bos
0.80
alities
0.72
ifice
0.64
atives
0.64
incentiv
0.64
antioxid
0.61
protections
0.61
bodied
0.60
ative
0.59
Activations Density 0.031%