INDEX
Explanations
references to the natural environment and its features
New Auto-Interp
Negative Logits
somebody
-0.79
сылкі
-0.74
πως
-0.71
anybody
-0.68
sort
-0.66
doings
-0.65
kinda
-0.64
illetve
-0.62
disambiguazione
-0.62
everybody
-0.62
POSITIVE LOGITS
NUMX
1.01
XNUMX
0.88
?!
0.70
.;
0.63
և
0.61
̵
0.59
!?
0.59
...)
0.58
.:
0.57
!!!
0.56
Activations Density 0.242%