INDEX
Explanations
references to social dynamics and interpersonal relationships
New Auto-Interp
Negative Logits
daemon
-0.18
Kro
-0.15
tsky
-0.15
chts
-0.15
wap
-0.15
engo
-0.14
ssue
-0.14
-gnu
-0.14
_station
-0.14
åħį
-0.14
POSITIVE LOGITS
instead
0.23
now
0.22
instead
0.22
artık
0.21
Gone
0.18
ÑĤепеÑĢÑĮ
0.18
Instead
0.18
increased
0.17
niż
0.17
Instead
0.17
Activations Density 0.221%