INDEX
Explanations
references to legal actions or decisions regarding individuals and their status
New Auto-Interp
Negative Logits
poichè
-0.88
således
-0.78
endast
-0.73
几人
-0.73
ainfi
-0.72
깥
-0.72
אך
-0.70
yalnızca
-0.68
feroit
-0.68
آنان
-0.67
POSITIVE LOGITS
really
1.07
somebody
1.06
sort
1.06
kind
1.03
basically
0.98
gonna
0.97
everybody
0.97
yeah
0.93
maybe
0.93
somebody
0.93
Activations Density 1.477%