INDEX
Explanations
themes related to emotional states and social interactions
New Auto-Interp
Negative Logits
due
-0.22
due
-0.21
åĺĽ
-0.18
Due
-0.17
_due
-0.17
thanks
-0.17
uld
-0.16
Due
-0.16
olk
-0.15
thers
-0.15
POSITIVE LOGITS
because
0.41
because
0.38
porque
0.36
Because
0.36
Because
0.36
ï¼ĮåĽłä¸º
0.34
åĽłä¸º
0.33
perché
0.32
поÑĤомÑĥ
0.32
ecause
0.28
Activations Density 0.275%