INDEX
Explanations
concerns or worries expressed in text
recurring themes of concern
New Auto-Interp
Negative Logits
RIP
-0.77
OUP
-0.76
Sort
-0.74
redits
-0.73
haw
-0.72
salute
-0.71
toast
-0.68
Doodle
-0.68
å§
-0.68
precincts
-0.64
POSITIVE LOGITS
severe
0.80
cens
0.78
developing
0.75
preventing
0.73
mounting
0.72
inadequ
0.72
excessive
0.71
undue
0.71
litigation
0.71
heightened
0.70
Activations Density 0.134%