INDEX
Explanations
verbs related to negative effects or consequences
New Auto-Interp
Negative Logits
sidx
-0.66
cius
-0.62
injunction
-0.57
Pwr
-0.57
bang
-0.57
stead
-0.56
Sina
-0.55
crawl
-0.55
Om
-0.54
Observer
-0.52
POSITIVE LOGITS
ing
2.57
ed
1.33
ions
1.29
edIn
1.25
ING
1.21
ively
1.19
ment
1.18
ingly
1.17
ging
1.17
ation
1.16
Activations Density 0.197%