INDEX
Explanations
references to addiction and its related issues
New Auto-Interp
Negative Logits
entanto
-0.77
however
-0.76
甚至
-0.68
jopa
-0.66
sogar
-0.64
however
-0.64
dokonce
-0.63
bahkan
-0.60
thậm
-0.60
hatta
-0.58
POSITIVE LOGITS
przecież
1.01
ohnehin
0.99
already
0.94
畢竟
0.92
inherently
0.91
Хьажоргаш
0.88
毕竟
0.87
already
0.87
이미
0.81
Schließlich
0.80
Activations Density 0.740%