INDEX
Explanations
interrogative and conditional phrases related to questioning or uncertainty
New Auto-Interp
Negative Logits
dera
-0.16
.Management
-0.15
mdb
-0.15
_HI
-0.14
_typeof
-0.14
گرÙģØªÙĩ
-0.14
ادÙĩ
-0.14
arÃŃa
-0.13
dere
-0.13
cona
-0.13
POSITIVE LOGITS
did
1.13
Did
1.05
Did
1.01
did
0.99
DID
0.87
.did
0.80
didn
0.71
Didn
0.64
didnt
0.63
didn
0.59
Activations Density 0.317%