INDEX
Explanations
statements of belief, opinion, or emotional expression
New Auto-Interp
Negative Logits
ezier
-0.16
jang
-0.16
.intellij
-0.15
女åŃIJ
-0.14
transf
-0.14
ç¼
-0.14
ÑĢÑıд
-0.14
ange
-0.14
.libs
-0.14
.reducer
-0.14
POSITIVE LOGITS
personally
0.27
personal
0.17
åĢij
0.16
Daly
0.16
/cop
0.15
лиÑĩ
0.15
himself
0.15
personal
0.15
Personally
0.15
Hed
0.15
Activations Density 0.126%