INDEX
Explanations
words related to events or actions with significant impact or consequences
terms related to accusations or labeling someone
New Auto-Interp
Negative Logits
amen
-0.63
VD
-0.62
dynamic
-0.60
gathering
-0.56
vic
-0.56
imper
-0.55
Dynamic
-0.55
juggling
-0.55
Nation
-0.55
trust
-0.55
POSITIVE LOGITS
lled
4.85
lling
2.42
lly
1.34
ller
1.30
led
1.26
wered
1.24
llers
1.23
gged
1.23
cled
1.12
lished
1.12
Activations Density 0.004%