INDEX
Explanations
phrases related to expressing concern or worry
instances of the word "concerned."
New Auto-Interp
Negative Logits
artifacts
-0.76
Bom
-0.73
avorite
-0.69
arb
-0.68
ingers
-0.67
ingen
-0.67
lite
-0.65
ety
-0.65
sword
-0.64
obs
-0.64
POSITIVE LOGITS
trolling
0.78
reon
0.76
lessly
0.73
ingly
0.71
ienced
0.69
wart
0.69
atives
0.69
cerned
0.68
iversal
0.68
edly
0.67
Activations Density 0.029%