INDEX
Explanations
identity, existence, feelings
New Auto-Interp
Negative Logits
vests
0.50
advertisements
0.46
hamburgers
0.46
soybeans
0.45
necessitated
0.43
advertising
0.43
strollers
0.41
Cent
0.40
showrooms
0.40
Augusta
0.40
POSITIVE LOGITS
stesso
0.54
stessa
0.51
włas
0.51
fratello
0.51
identité
0.50
чувства
0.49
Self
0.49
esistenza
0.48
스스로
0.46
když
0.45
Activations Density 0.001%