INDEX
Explanations
phrases indicating high levels of concern or urgency
New Auto-Interp
Negative Logits
ниÑĩ
-0.15
itself
-0.15
cl
-0.15
erties
-0.14
ensible
-0.14
ãģĹãģ®
-0.14
ällt
-0.14
wer
-0.14
ivery
-0.14
olet
-0.14
POSITIVE LOGITS
Breed
0.16
urator
0.16
Dodd
0.16
Rip
0.15
еÑı
0.15
uda
0.14
odata
0.14
kad
0.14
ÅĻe
0.14
ason
0.14
Activations Density 0.046%