INDEX
Explanations
references to philosophical concepts and debates
New Auto-Interp
Negative Logits
úb
-0.17
ego
-0.15
eka
-0.15
Alto
-0.14
mej
-0.14
odu
-0.14
ushman
-0.14
rud
-0.13
ittance
-0.13
æ¶
-0.13
POSITIVE LOGITS
stress
0.18
viewing
0.17
saw
0.17
view
0.16
ienne
0.16
talk
0.16
speaking
0.16
associates
0.16
treat
0.15
seeing
0.15
Activations Density 0.210%