INDEX
Explanations
expressions of gratitude directed towards individuals
New Auto-Interp
Negative Logits
lassen
-0.17
leans
-0.17
rada
-0.16
apr
-0.16
ondo
-0.15
usto
-0.15
oro
-0.15
iston
-0.15
ampo
-0.14
ÂĢÂ
-0.14
POSITIVE LOGITS
aliz
0.20
uu
0.19
istrovstvÃŃ
0.18
åĢij
0.16
yers
0.16
elson
0.15
/us
0.15
gere
0.15
’re
0.14
ãģĶãģĸãģĦãģ¾ãģĻ
0.14
Activations Density 0.010%