INDEX
Explanations
fightthenewdrug.org and harmful effects
New Auto-Interp
Negative Logits
somewhat
0.46
counseling
0.44
status
0.42
journaling
0.41
therapy
0.40
terapeut
0.40
priest
0.39
নিত
0.39
occasionally
0.39
偶尔
0.39
POSITIVE LOGITS
harms
0.54
destabil
0.53
zwycię
0.50
icznej
0.48
inhuman
0.48
injustice
0.47
undermines
0.46
harmful
0.45
beatable
0.45
needlessly
0.45
Activations Density 0.057%