INDEX
Explanations
negations and contractions (isn't, didn't)
New Auto-Interp
Negative Logits
electronic
0.59
2
0.59
3
0.57
1
0.57
addition
0.54
-
0.54
7
0.54
எனப்படும்
0.53
5
0.53
0.52
POSITIVE LOGITS
Admittedly
0.71
Policing
0.71
Elektrokh
0.71
Enseñanza
0.69
ᆹ
0.69
Nobody
0.68
Didn
0.67
Shouldn
0.67
Communism
0.67
Minimalism
0.66
Activations Density 0.630%