INDEX
Explanations
words related to discussion, debate, or argument
phrases related to moral or ethical implications
New Auto-Interp
Negative Logits
recovered
-0.76
stood
-0.76
effects
-0.71
survives
-0.70
stabilized
-0.65
reusable
-0.65
survived
-0.64
ukong
-0.64
retained
-0.64
»Ĵ
-0.63
POSITIVE LOGITS
folly
1.21
disingen
1.20
foolish
1.15
blasphemy
1.15
irresponsible
1.13
heresy
1.12
udicrous
1.11
delusional
1.10
dishon
1.10
insulting
1.08
Activations Density 0.236%