INDEX
Explanations
phrases or terms associated with specific physical sensations or actions
New Auto-Interp
Negative Logits
Anſ
-0.91
Theſe
-0.85
Reſ
-0.85
úsqueda
-0.83
myſelf
-0.79
Houſe
-0.78
Diſ
-0.77
pleaſure
-0.75
Eſ
-0.75
Efq
-0.74
POSITIVE LOGITS
la
0.70
si
0.64
pe
0.63
care
0.61
de
0.61
sa
0.56
une
0.50
sau
0.50
mas
0.49
co
0.49
Activations Density 0.029%