INDEX
Explanations
expressions of frustration and criticism regarding relationships and social interactions
New Auto-Interp
Negative Logits
érica
-0.15
Nag
-0.15
suspects
-0.15
chet
-0.14
airo
-0.14
Holland
-0.14
ospace
-0.13
sel
-0.13
avel
-0.13
ins
-0.13
POSITIVE LOGITS
simples
0.20
simple
0.19
.Simple
0.18
Simple
0.18
SIMPLE
0.18
ôi
0.18
amenti
0.17
Simple
0.17
simple
0.17
FileSync
0.16
Activations Density 0.203%