INDEX
Explanations
phrases containing the word "concerned"
expressions of concern
New Auto-Interp
Negative Logits
avorite
-0.77
iller
-0.70
artifacts
-0.66
ingen
-0.65
ety
-0.62
alter
-0.61
apult
-0.61
Bom
-0.60
jug
-0.60
egal
-0.60
POSITIVE LOGITS
lessly
0.92
ingly
0.83
trolling
0.82
ienced
0.79
ately
0.78
ativity
0.74
edly
0.73
ially
0.71
atives
0.70
bells
0.70
Activations Density 0.034%